Tootfinder

Opt-in global Mastodon full text search. Join the index!

No exact results. Similar results found.
@arXiv_csAI_bot@mastoxiv.page
2025-09-01 08:37:52

AHELM: A Holistic Evaluation of Audio-Language Models
Tony Lee, Haoqin Tu, Chi Heem Wong, Zijun Wang, Siwei Yang, Yifan Mai, Yuyin Zhou, Cihang Xie, Percy Liang
arxiv.org/abs/2508.21376

@arXiv_csCL_bot@mastoxiv.page
2025-09-01 08:41:22

Quantifying Label-Induced Bias in Large Language Model Self- and Cross-Evaluations
Muskan Saraf, Sajjad Rezvani Boroujeni, Justin Beaudry, Hossein Abedi, Tom Bush
arxiv.org/abs/2508.21164

@cosmos4u@scicomm.xyz
2025-09-30 18:58:43

The 6 year radio lightcurve of the #TidalDisruptionEvent AT2019azh: arxiv.org/abs/2509.17525 -> Long-term radio observations track the evolution of a tidal disruption event: phys.org/news/2025-09-term-rad

@arXiv_csCV_bot@mastoxiv.page
2025-10-01 11:46:27

Image-Difficulty-Aware Evaluation of Super-Resolution Models
Atakan Topaloglu, Ahmet Bilican, Cansu Korkmaz, A. Murat Tekalp
arxiv.org/abs/2509.26398

@arXiv_csLG_bot@mastoxiv.page
2025-09-01 09:54:22

Comprehensive Signal Quality Evaluation of a Wearable Textile ECG Garment: A Sex-Balanced Study
Maximilian P. Oppelt, Tobias S. Zech, Sarah H. Lorenz, Laurenz Ottmann, Jan Steffan, Bjoern M. Eskofier, Nadine R. Lang-Richter, Norman Pfeiffer
arxiv.org/abs/2508.21554

@arXiv_statML_bot@mastoxiv.page
2025-09-30 08:20:38

Variance-Bounded Evaluation without Ground Truth: VB-Score
Kaihua Ding
arxiv.org/abs/2509.22751 arxiv.org/pdf/2509.22751

@arXiv_csAI_bot@mastoxiv.page
2025-10-01 11:26:57

SafeEvalAgent: Toward Agentic and Self-Evolving Safety Evaluation of LLMs
Yixu Wang, Xin Wang, Yang Yao, Xinyuan Li, Yan Teng, Xingjun Ma, Yingchun Wang
arxiv.org/abs/2509.26100

@arXiv_csCL_bot@mastoxiv.page
2025-09-30 14:01:11

Evaluating Spatiotemporal Consistency in Automatically Generated Sewing Instructions
Luisa Geiger, Mareike Hartmann, Michael Sullivan, Alexander Koller
arxiv.org/abs/2509.24792

@arXiv_csCL_bot@mastoxiv.page
2025-09-30 14:13:36

Towards Personalized Deep Research: Benchmarks and Evaluations
Yuan Liang, Jiaxian Li, Yuqing Wang, Piaohong Wang, Motong Tian, Pai Liu, Shuofei Qiao, Runnan Fang, He Zhu, Ge Zhang, Minghao Liu, Yuchen Eleanor Jiang, Ningyu Zhang, Wangchunshu Zhou
arxiv.org/abs/2509.25106

@arXiv_csCL_bot@mastoxiv.page
2025-10-01 11:37:47

MENLO: From Preferences to Proficiency - Evaluating and Modeling Native-like Quality Across 47 Languages
Chenxi Whitehouse, Sebastian Ruder, Tony Lin, Oksana Kurylo, Haruka Takagi, Janice Lam, Nicol\`o Busetto, Denise Diaz
arxiv.org/abs/2509.26601