AHELM: A Holistic Evaluation of Audio-Language Models
Tony Lee, Haoqin Tu, Chi Heem Wong, Zijun Wang, Siwei Yang, Yifan Mai, Yuyin Zhou, Cihang Xie, Percy Liang
https://arxiv.org/abs/2508.21376
Quantifying Label-Induced Bias in Large Language Model Self- and Cross-Evaluations
Muskan Saraf, Sajjad Rezvani Boroujeni, Justin Beaudry, Hossein Abedi, Tom Bush
https://arxiv.org/abs/2508.21164
The 6 year radio lightcurve of the #TidalDisruptionEvent AT2019azh: https://arxiv.org/abs/2509.17525 -> Long-term radio observations track the evolution of a tidal disruption event: https://phys.org/news/2025-09-term-radio-track-evolution-tidal.html
Image-Difficulty-Aware Evaluation of Super-Resolution Models
Atakan Topaloglu, Ahmet Bilican, Cansu Korkmaz, A. Murat Tekalp
https://arxiv.org/abs/2509.26398 https://
Comprehensive Signal Quality Evaluation of a Wearable Textile ECG Garment: A Sex-Balanced Study
Maximilian P. Oppelt, Tobias S. Zech, Sarah H. Lorenz, Laurenz Ottmann, Jan Steffan, Bjoern M. Eskofier, Nadine R. Lang-Richter, Norman Pfeiffer
https://arxiv.org/abs/2508.21554
Variance-Bounded Evaluation without Ground Truth: VB-Score
Kaihua Ding
https://arxiv.org/abs/2509.22751 https://arxiv.org/pdf/2509.22751
SafeEvalAgent: Toward Agentic and Self-Evolving Safety Evaluation of LLMs
Yixu Wang, Xin Wang, Yang Yao, Xinyuan Li, Yan Teng, Xingjun Ma, Yingchun Wang
https://arxiv.org/abs/2509.26100
Evaluating Spatiotemporal Consistency in Automatically Generated Sewing Instructions
Luisa Geiger, Mareike Hartmann, Michael Sullivan, Alexander Koller
https://arxiv.org/abs/2509.24792
Towards Personalized Deep Research: Benchmarks and Evaluations
Yuan Liang, Jiaxian Li, Yuqing Wang, Piaohong Wang, Motong Tian, Pai Liu, Shuofei Qiao, Runnan Fang, He Zhu, Ge Zhang, Minghao Liu, Yuchen Eleanor Jiang, Ningyu Zhang, Wangchunshu Zhou
https://arxiv.org/abs/2509.25106
MENLO: From Preferences to Proficiency - Evaluating and Modeling Native-like Quality Across 47 Languages
Chenxi Whitehouse, Sebastian Ruder, Tony Lin, Oksana Kurylo, Haruka Takagi, Janice Lam, Nicol\`o Busetto, Denise Diaz
https://arxiv.org/abs/2509.26601