Tootfinder

Opt-in global Mastodon full text search. Join the index!

No exact results. Similar results found.
@arXiv_quantph_bot@mastoxiv.page
2025-06-27 09:58:29

Optical response of a binary atomic system with incoherent gain
L. Acevedo, J. S\'anchez-C\'anovas, M. Donaire
arxiv.org/abs/2506.21177

@arXiv_econGN_bot@mastoxiv.page
2025-06-27 08:10:19

Institutional Noise, Strategic Deviation, and Intertemporal Collapse: A Formal Model of Miner Behaviour under Protocol Uncertainty
Craig Steven Wright
arxiv.org/abs/2506.20992

@arXiv_csCL_bot@mastoxiv.page
2025-06-27 09:59:49

Potemkin Understanding in Large Language Models
Marina Mancoridis, Bec Weeks, Keyon Vafa, Sendhil Mullainathan
arxiv.org/abs/2506.21521 arxiv.org/pdf/2506.21521 arxiv.org/html/2506.21521
arXiv:2506.21521v1 Announce Type: new
Abstract: Large language models (LLMs) are regularly evaluated using benchmark datasets. But what justifies making inferences about an LLM's capabilities based on its answers to a curated set of questions? This paper first introduces a formal framework to address this question. The key is to note that the benchmarks used to test LLMs -- such as AP exams -- are also those used to test people. However, this raises an implication: these benchmarks are only valid tests if LLMs misunderstand concepts in ways that mirror human misunderstandings. Otherwise, success on benchmarks only demonstrates potemkin understanding: the illusion of understanding driven by answers irreconcilable with how any human would interpret a concept. We present two procedures for quantifying the existence of potemkins: one using a specially designed benchmark in three domains, the other using a general procedure that provides a lower-bound on their prevalence. We find that potemkins are ubiquitous across models, tasks, and domains. We also find that these failures reflect not just incorrect understanding, but deeper internal incoherence in concept representations.
toXiv_bot_toot

@servelan@newsie.social
2025-06-24 07:08:25

Pasadena officially condemns immigration raids with vote on formal statement – NBC Los Angeles
nbclosangeles.com/news/local/p

@arXiv_qfinMF_bot@mastoxiv.page
2025-05-27 07:50:58

Bulls vs Bears: a Trinomial Model of a Financial Asset
Nahuel I. Arca
arxiv.org/abs/2505.18723 arxiv.org/pdf/2505.187…

@arXiv_csSE_bot@mastoxiv.page
2025-06-26 08:41:20

The Composition of Digital Twins for Systems-of-Systems: a Systematic Literature Review
Mennatullah T. Khedr, John S. Fitzgerald
arxiv.org/abs/2506.20435

@mxp@mastodon.acm.org
2025-06-24 20:24:27

I increasingly think that we need mandatory formal training for PhD students in “research as a profession” (as it exists in other countries, apparently).
Yes, submitting a paper to a conference (in particular as a first author) constitutes an commitment to present the paper if it’s accepted. When working with coauthors, no, you can’t tacitly assume that they’ll take care of it.
And yes, a workshop on Monday from 09:00 to 12:30 takes place during “normal working hours.”

@arXiv_csGR_bot@mastoxiv.page
2025-05-28 07:18:28

Stochastic Preconditioning for Neural Field Optimization
Selena Ling, Merlin Nimier-David, Alec Jacobson, Nicholas Sharp
arxiv.org/abs/2505.20473

@arXiv_statME_bot@mastoxiv.page
2025-06-26 09:13:10

hdbayes: An R Package for Bayesian Analysis of Generalized Linear Models Using Historical Data
Ethan M. Alt, Xinxin Chen, Luiz M. Carvalho, Joseph G. Ibrahim
arxiv.org/abs/2506.20060

@arXiv_csCL_bot@mastoxiv.page
2025-06-26 09:02:50

COIN: Uncertainty-Guarding Selective Question Answering for Foundation Models with Provable Risk Guarantees
Zhiyuan Wang, Jinhao Duan, Qingni Wang, Xiaofeng Zhu, Tianlong Chen, Xiaoshuang Shi, Kaidi Xu
arxiv.org/abs/2506.20178