Tootfinder

Opt-in global Mastodon full text search. Join the index!

No exact results. Similar results found.
@heiseonline@social.heise.de
2025-09-11 16:38:00

Forschung: Verschmelzen Menschen und KI zu einem "evolutionären Individuum"?
Mit der Entwicklung von KI-Technik könnte die Menschheit einen großen evolutionären Übergang eingeleitet haben. Das meinen zumindest zwei Evolutionsbiologen.

@arXiv_csCV_bot@mastoxiv.page
2025-10-13 10:36:10

A methodology for clinically driven interactive segmentation evaluation
Parhom Esmaeili, Virginia Fernandez, Pedro Borges, Eli Gibson, Sebastien Ourselin, M. Jorge Cardoso
arxiv.org/abs/2510.09499

@arXiv_csCL_bot@mastoxiv.page
2025-10-13 10:29:10

MaP: A Unified Framework for Reliable Evaluation of Pre-training Dynamics
Jiapeng Wang, Changxin Tian, Kunlong Chen, Ziqi Liu, Jiaxin Mao, Wayne Xin Zhao, Zhiqiang Zhang, Jun Zhou
arxiv.org/abs/2510.09295

@arXiv_csAI_bot@mastoxiv.page
2025-10-13 08:32:50

What Is Your Agent's GPA? A Framework for Evaluating Agent Goal-Plan-Action Alignment
Allison Sihan Jia, Daniel Huang, Nikhil Vytla, Nirvika Choudhury, John C Mitchell, Anupam Datta
arxiv.org/abs/2510.08847

@arXiv_astrophSR_bot@mastoxiv.page
2025-10-13 08:40:50

The Sonora Substellar Atmosphere Models VI. Red Diamondback: Extending Diamondback with SPHINX for Brown Dwarf Early Evolution
C. Evan Davis, Jonathan J. Fortney, Aishwarya Iyer, Sagnick Mukherjee, Caroline V. Morley, Mark S. Marley, Michael Line, Philip S. Muirhead
arxiv.org/abs/2510.08694

@arXiv_csSE_bot@mastoxiv.page
2025-09-11 08:53:43

Beyond the Binary: The System of All-round Evaluation of Research and Its Practices in China
Yu Zhu, Jiyuan Ye
arxiv.org/abs/2509.08546 arx…

@arXiv_csAI_bot@mastoxiv.page
2025-10-13 09:33:10

TripScore: Benchmarking and rewarding real-world travel planning with fine-grained evaluation
Yincen Qu, Huan Xiao, Feng Li, Hui Zhou, Xiangying Dai
arxiv.org/abs/2510.09011

@arXiv_csCL_bot@mastoxiv.page
2025-10-13 10:31:50

ReTraceQA: Evaluating Reasoning Traces of Small Language Models in Commonsense Question Answering
Francesco Maria Molfese, Luca Moroni, Ciro Porcaro, Simone Conia, Roberto Navigli
arxiv.org/abs/2510.09351

@arXiv_csCL_bot@mastoxiv.page
2025-10-13 10:28:20

Inflated Excellence or True Performance? Rethinking Medical Diagnostic Benchmarks with Dynamic Evaluation
Xiangxu Zhang, Lei Li, Yanyun Zhou, Xiao Zhou, Yingying Zhang, Xian Wu
arxiv.org/abs/2510.09275

@arXiv_csCL_bot@mastoxiv.page
2025-08-13 10:05:22

Reveal-Bangla: A Dataset for Cross-Lingual Multi-Step Reasoning Evaluation
Khondoker Ittehadul Islam, Gabriele Sarti
arxiv.org/abs/2508.08933