Tootfinder

Opt-in global Mastodon full text search. Join the index!

@gedankenstuecke@scholar.social
2025-06-09 09:02:06

«Worse, as the latest Apple papers shows, LLMs may well work on your easy test set (like Hanoi with 4 discs) and seduce you into thinking it has built a proper, generalizable solution when it does not.»
The technology that keeps on giving
garymarcus.substack.com/p/a-kn

@arXiv_csDB_bot@mastoxiv.page
2025-07-08 08:51:30

AKEGEN: A LLM-based Tabular Corpus Generator for Evaluating Dataset Discovery in Data Lakes
Zhenwei Dai, Chuan Lei, Asterios Katsifodimos, Xiao Qin, Christos Faloutsos, Huzefa Rangwala
arxiv.org/abs/2507.04687

@cosmos4u@scicomm.xyz
2025-07-07 23:26:34

Exoplanet Atmospheric Refraction Effects in the #Kepler Sample: arxiv.org/abs/2507.02126 -> "We present an analysis on the detection viability of refraction effects in Kepler's exoplanet atmospheres using binning techniques for their light curves in order to compare against simulated refraction effects. We split the Kepler exoplanets into sub-populations according to orbital period and planetary radius, then search for out-of-transit changes in the relative flux associated with atmospheric refraction of starlight. The presence of refraction effects - or lack thereof - may be used to measure and set limits on the bulk properties of an atmosphere, including mean molecular weight or the presence of hazes.
In this work, we use the presence of refraction effects to test whether exoplanets above the period-radius valley have H/He atmospheres, which high levels of stellar radiation could evaporate away, in turn leaving rocky cores below the valley. We find strong observational evidence of refraction effects for exoplanets above the period-radius valley based on Kepler photometry, however those related to optically thin H/He atmospheres are not common in the observed planetary population. This result may be attributed to signal dampening caused by clouds and hazes, consistent with the optically thick and intrinsically hotter atmospheres of Kepler exoplanets caused by relatively close host star proximity."

@arXiv_grqc_bot@mastoxiv.page
2025-07-08 12:51:50

Towards a consistent Semiclassical Theory of Gravity
Francisco Pipa
arxiv.org/abs/2507.05237 arxiv.org/pdf/2507.05237…

@paulbusch@mstdn.ca
2025-07-04 22:42:36

But that is adjustable via the included Ecobee smart thermostat. If we set the backup heat to start at minus 5°C then our furnace could be running for 8 weeks or less. The key is to test and monitor electricity consumption at lower temperatures since the heat pump works harder when it's colder. Then compare our gas and electricity bills to determine the right set-up. Over time we should find the right balance point.

@netzschleuder@social.skewed.de
2025-06-04 10:00:05

hiv_transmission: HIV transmission network (1988-2001)
A set of networks of HIV transmissions between people through sexual, needle-sharing, or social connections, based on combining 8 datasets collected from 1988 to 2001. Metadata includes test results of several diseases, as well as demographic variables such as age, ethnicity, and gender. Networks come in two flavors: egodyads and altdyads. Egodyads are the network among study-participants and their direct partners. Altdyads are the…

hiv_transmission: HIV transmission network (1988-2001). 35229 nodes, 85890 edges. https://networks.skewed.de/net/hiv_transmission
@mgorny@social.treehouse.systems
2025-07-01 16:20:14

One of the goals I've set for further development of #Python eclasses in #Gentoo was to avoid needless complexity. Unfortunately, the subject matter sometimes requires them. However, many of the functions added lately were already manually done in ebuilds for years.
We've started disabling plugin autoloading years ago. First we just did that for individual packages that caused issues. Then, for these where tests ended up being really slow. Finally, pretty much anywhere `python_test()` was declared. Doing it all manually was particularly cumbersome — all I needed for `EPYTEST_PLUGINS` is a good idea how to generalize it.
Similarly, `EPYTEST_XDIST` was added after we have been adding manually `epytest -p xdist -n "$(makeopts_jobs)" --dist=worksteal` — and while at it, I've added `EPYTEST_JOBS` to override the job count.
Perhaps `EPYTEST_TIMEOUT` wasn't that common. However, it was meant to help CI systems that could otherwise get stuck on hanging test.
Similarly, "standard library" version (like `3.9`) matching to `python_gen_cond_dep` was added after a long period of explicitly stating `python3_9 pypy3`. As an extra benefit, this also resolved the problem that at the time `pypy3` could mean different Python versions.

@jerome@jasette.facil.services
2025-06-04 13:46:46

No quote post yet
Remote fetch replies is set as experimental and also asynchronous, unclear if it would be ready for the final release. mastodon.social/@MastodonEngin

@arXiv_statME_bot@mastoxiv.page
2025-06-03 17:27:08

This arxiv.org/abs/2410.12201 has been replaced.
link: scholar.google.com/scholar?q=a

@arXiv_econEM_bot@mastoxiv.page
2025-07-03 08:26:40

Uniform Validity of the Subset Anderson-Rubin Test under Heteroskedasticity and Nonlinearity
Atsushi Inoue, \`Oscar Jord\`a, Guido M. Kuersteiner
arxiv.org/abs/2507.01167

@jochenlingelba1@h-net.social
2025-06-30 08:35:48

WhichYear 6/30/25
3566 pts
9.0 avg. years off
3️⃣ ⚪ ⚪ ⚪ 1️⃣
whichyr.com

@arXiv_eessAS_bot@mastoxiv.page
2025-06-06 09:39:10

This arxiv.org/abs/2505.22251 has been replaced.
initial toot: mastoxiv.page/@arXiv_ees…

@Xexyz@mastodon.me.uk
2025-05-15 14:23:32

Test Drive Unlimited: a quieter Horizon Festival
I found TDU in a box in the loft a couple of weeks ago, and since I now have a new Xbox 360 set up (replacing my old one which didn't have HDMI output) so can play games which are not compatible with the Xbox One, I thought I'd revisit the game. Unfortunately my save wasn't in the cloud, and wasn't transferred while I had both consoles set up, so I needed to start from the beginning.

@netzschleuder@social.skewed.de
2025-05-31 13:00:05

hiv_transmission: HIV transmission network (1988-2001)
A set of networks of HIV transmissions between people through sexual, needle-sharing, or social connections, based on combining 8 datasets collected from 1988 to 2001. Metadata includes test results of several diseases, as well as demographic variables such as age, ethnicity, and gender. Networks come in two flavors: egodyads and altdyads. Egodyads are the network among study-participants and their direct partners. Altdyads are the…

hiv_transmission: HIV transmission network (1988-2001). 35229 nodes, 85890 edges. https://networks.skewed.de/net/hiv_transmission
@arXiv_csCR_bot@mastoxiv.page
2025-07-01 07:40:43

In-context learning for the classification of manipulation techniques in phishing emails
Antony Dalmiere (LAAS-TRUST, LAAS), Guillaume Auriol (LAAS-TRUST, INSA Toulouse), Vincent Nicomette (LAAS-TSF, LAAS), Pascal Marchand (LERASS)
arxiv.org/abs/2506.22515

@arXiv_mathST_bot@mastoxiv.page
2025-06-13 08:40:00

A note on the properties of the confidence set for the local average treatment effect obtained by inverting the score test
Ezequiel Smucler, Ludovico Lanni, David Masip
arxiv.org/abs/2506.10449

@arXiv_csCL_bot@mastoxiv.page
2025-06-27 09:56:59

Text2Cypher Across Languages: Evaluating Foundational Models Beyond English
Makbule Gulcin Ozsoy, William Tai
arxiv.org/abs/2506.21445

@arXiv_econEM_bot@mastoxiv.page
2025-05-28 07:22:10

Conditional Method Confidence Set
Lukas Bauer, Ekaterina Kazak
arxiv.org/abs/2505.21278 arxiv.org/pdf/2505.21278

@netzschleuder@social.skewed.de
2025-05-31 13:00:05

hiv_transmission: HIV transmission network (1988-2001)
A set of networks of HIV transmissions between people through sexual, needle-sharing, or social connections, based on combining 8 datasets collected from 1988 to 2001. Metadata includes test results of several diseases, as well as demographic variables such as age, ethnicity, and gender. Networks come in two flavors: egodyads and altdyads. Egodyads are the network among study-participants and their direct partners. Altdyads are the…

hiv_transmission: HIV transmission network (1988-2001). 35229 nodes, 85890 edges. https://networks.skewed.de/net/hiv_transmission
@arXiv_csNE_bot@mastoxiv.page
2025-06-25 07:44:39

A Unified Platform to Evaluate STDP Learning Rule and Synapse Model using Pattern Recognition in a Spiking Neural Network
Jaskirat Singh Maskeen, Sandip Lashkare
arxiv.org/abs/2506.19377

@arXiv_csIT_bot@mastoxiv.page
2025-06-13 07:48:40

Optimal Non-Adaptive Group Testing with One-Sided Error Guarantees
Daniel McMorrow, Jonathan Scarlett
arxiv.org/abs/2506.10374

@arXiv_csCY_bot@mastoxiv.page
2025-06-11 07:24:43

Evaluation of Machine Learning Models in Student Academic Performance Prediction
A. G. R. Sandeepa, Sanka Mohottala
arxiv.org/abs/2506.08047

@arXiv_condmatstatmech_bot@mastoxiv.page
2025-06-19 09:27:32

Benchmarks for protocol control in nonequilibrium statistical mechanics
Stephen Whitelam, Corneel Casert, Megan Engel, Isaac Tamblyn
arxiv.org/abs/2506.15122

@arXiv_csSE_bot@mastoxiv.page
2025-06-10 16:55:39

This arxiv.org/abs/2410.08911 has been replaced.
initial toot: mastoxiv.page/@arXiv_csSE_…

@arXiv_econEM_bot@mastoxiv.page
2025-06-26 08:07:30

A Sharp and Robust Test for Selective Reporting
Stefan Faridani
arxiv.org/abs/2506.20035 arxiv.org/pdf/2506.20035

@arXiv_eessIV_bot@mastoxiv.page
2025-06-23 11:47:30

Proportional Sensitivity in Generative Adversarial Network (GAN)-Augmented Brain Tumor Classification Using Convolutional Neural Network
Mahin Montasir Afif, Abdullah Al Noman, K. M. Tahsin Kabir, Md. Mortuza Ahmmed, Md. Mostafizur Rahman, Mufti Mahmud, Md. Ashraful Babu
arxiv.org/abs/2506.17165

@arXiv_mathST_bot@mastoxiv.page
2025-06-03 07:37:24

Detecting non-uniform patterns on high-dimensional hyperspheres
Tiefeng Jiang, Tuan Pham
arxiv.org/abs/2506.00444 arx…

@arXiv_quantph_bot@mastoxiv.page
2025-06-12 09:50:22

Effective criteria for entanglement witnesses in small dimensions
{\L}ukasz Grzelka, {\L}ukasz Skowronek, Karol \.Zyczkowski
arxiv.org/abs/2506.09298

@arXiv_csCL_bot@mastoxiv.page
2025-06-27 09:59:49

Potemkin Understanding in Large Language Models
Marina Mancoridis, Bec Weeks, Keyon Vafa, Sendhil Mullainathan
arxiv.org/abs/2506.21521 arxiv.org/pdf/2506.21521 arxiv.org/html/2506.21521
arXiv:2506.21521v1 Announce Type: new
Abstract: Large language models (LLMs) are regularly evaluated using benchmark datasets. But what justifies making inferences about an LLM's capabilities based on its answers to a curated set of questions? This paper first introduces a formal framework to address this question. The key is to note that the benchmarks used to test LLMs -- such as AP exams -- are also those used to test people. However, this raises an implication: these benchmarks are only valid tests if LLMs misunderstand concepts in ways that mirror human misunderstandings. Otherwise, success on benchmarks only demonstrates potemkin understanding: the illusion of understanding driven by answers irreconcilable with how any human would interpret a concept. We present two procedures for quantifying the existence of potemkins: one using a specially designed benchmark in three domains, the other using a general procedure that provides a lower-bound on their prevalence. We find that potemkins are ubiquitous across models, tasks, and domains. We also find that these failures reflect not just incorrect understanding, but deeper internal incoherence in concept representations.
toXiv_bot_toot

@arXiv_csIR_bot@mastoxiv.page
2025-06-13 07:38:00

Towards Understanding Bias in Synthetic Data for Evaluation
Hossein A. Rahmani, Varsha Ramineni, Nick Craswell, Bhaskar Mitra, Emine Yilmaz
arxiv.org/abs/2506.10301

@arXiv_mathFA_bot@mastoxiv.page
2025-06-16 08:14:49

A Diestel-Faires type result for multimeasures
Jos\'e Rodr\'iguez
arxiv.org/abs/2506.11872 arxiv.org/pdf/2506…

@arXiv_astrophGA_bot@mastoxiv.page
2025-06-12 08:41:31

Bar-driven dispersal of Galactic substructure
Adam M. Dillamore, Jason L. Sanders
arxiv.org/abs/2506.09117 arxiv.org/…

@netzschleuder@social.skewed.de
2025-06-15 12:00:04

hiv_transmission: HIV transmission network (1988-2001)
A set of networks of HIV transmissions between people through sexual, needle-sharing, or social connections, based on combining 8 datasets collected from 1988 to 2001. Metadata includes test results of several diseases, as well as demographic variables such as age, ethnicity, and gender. Networks come in two flavors: egodyads and altdyads. Egodyads are the network among study-participants and their direct partners. Altdyads are the…

hiv_transmission: HIV transmission network (1988-2001). 35229 nodes, 85890 edges. https://networks.skewed.de/net/hiv_transmission
@arXiv_quantph_bot@mastoxiv.page
2025-06-12 10:13:01

Certifying asymmetry in the configuration of three qubits
Abdelmalek Taoutioui, G\'abor Dr\'otos, Tam\'as V\'ertesi
arxiv.org/abs/2506.09939

@arXiv_econEM_bot@mastoxiv.page
2025-05-30 07:21:50

Evaluating financial tail risk forecasts: Testing Equal Predictive Ability
Lukas Bauer
arxiv.org/abs/2505.23333 arxiv…

@arXiv_csCL_bot@mastoxiv.page
2025-06-17 09:53:05

Refract ICL: Rethinking Example Selection in the Era of Million-Token Models
Arjun R. Akula, Kazuma Hashimoto, Krishna Srinivasan, Aditi Chaudhary, Karthik Raman, Michael Bendersky
arxiv.org/abs/2506.12346