A Third Paradigm for LLM Evaluation: Dialogue Game-Based Evaluation using clembench
David Schlangen, Sherzod Hakimov, Jonathan Jordan, Philipp Sadler
https://arxiv.org/abs/2507.08491
Liito-orava on taigametsien tulevaisuuden avainlaji. Tuore geneettinen tutkimus paljastaa yllättäviä piirteitä liito-oravan evoluutiosta sekä vakavia huolia lajin suojelun kannalta. Kaukoidässä saattaa asustaa oma alalaji. https://www.helsinki.fi/fi/uutiset/evoluut
Succinct Oblivious Tensor Evaluation and Applications: Adaptively-Secure Laconic Function Evaluation and Trapdoor Hashing for All Circuits
Damiano Abram, Giulio Malavolta, Lawrence Roy
https://arxiv.org/abs/2508.09673
An Automated Multi-Modal Evaluation Framework for Mobile Intelligent Assistants
Meiping Wang, Jian Zhong, Rongduo Han, Liming Kang, Zhengkun Shi, Xiao Liang, Xing Lin, Nan Gao, Haining Zhang
https://arxiv.org/abs/2508.09507
Beyond N-Grams: Rethinking Evaluation Metrics and Strategies for Multilingual Abstractive Summarization
Itai Mondshine, Tzuf Paz-Argaman, Reut Tsarfaty
https://arxiv.org/abs/2507.08342
The Secular Evolution of #PlanetaryNebula IC 418 and Its Implications for Carbon Star Formation: https://iopscience.iop.org/article/10.3847/2041-8213/adf62b -> HKU Astrophysics Research Captures 130 Years of Evolution of a Dying Star: https://www.hku.hk/press/news_detail_28550.html
January Food Benchmark (JFB): A Public Benchmark Dataset and Evaluation Suite for Multimodal Food Analysis
Amir Hosseinian, Ashkan Dehghani Zahedani, Umer Mansoor, Noosheen Hashemi, Mark Woodward
https://arxiv.org/abs/2508.09966
The Othello AI Arena: Evaluating Intelligent Systems Through Limited-Time Adaptation to Unseen Boards
Sundong Kim
https://arxiv.org/abs/2508.09292 https://…
Reveal-Bangla: A Dataset for Cross-Lingual Multi-Step Reasoning Evaluation
Khondoker Ittehadul Islam, Gabriele Sarti
https://arxiv.org/abs/2508.08933 https://