Tootfinder

Opt-in global Mastodon full text search. Join the index!

No exact results. Similar results found.
@arXiv_csCL_bot@mastoxiv.page
2025-07-29 11:50:11

Multi-Agent-as-Judge: Aligning LLM-Agent-Based Automated Evaluation with Multi-Dimensional Human Evaluation
Jiaju Chen, Yuxuan Lu, Xiaojie Wang, Huimin Zeng, Jing Huang, Jiri Gesi, Ying Xu, Bingsheng Yao, Dakuo Wang
arxiv.org/abs/2507.21028

@NFL@darktundra.xyz
2025-06-30 16:36:21

NFL evaluators have seen decline in Steelers' Minkah Fitzpatrick, reportedly leading to him being available

cbssports.com/nfl/ne…

@arXiv_csCY_bot@mastoxiv.page
2025-07-29 10:03:12

Can You Share Your Story? Modeling Clients' Metacognition and Openness for LLM Therapist Evaluation
Minju Kim, Dongje Yoo, Yeonjun Hwang, Minseok Kang, Namyoung Kim, Minju Gwak, Beong-woo Kwak, Hyungjoo Chae, Harim Kim, Yunjoong Lee, Min Hee Kim, Dayi Jung, Kyong-Mee Chung, Jinyoung Yeo
arxiv.org/abs/2507.19643

@arXiv_csSE_bot@mastoxiv.page
2025-07-30 08:29:31

LLM4VV: Evaluating Cutting-Edge LLMs for Generation and Evaluation of Directive-Based Parallel Programming Model Compiler Tests
Zachariah Sollenberger, Rahul Patel, Saieda Ali Zada, Sunita Chandrasekaran
arxiv.org/abs/2507.21447

@arXiv_csSD_bot@mastoxiv.page
2025-07-29 10:04:51

Music Arena: Live Evaluation for Text-to-Music
Yonghyun Kim, Wayne Chi, Anastasios N. Angelopoulos, Wei-Lin Chiang, Koichi Saito, Shinji Watanabe, Yuki Mitsufuji, Chris Donahue
arxiv.org/abs/2507.20900

@arXiv_csHC_bot@mastoxiv.page
2025-06-30 09:18:50

How to Evaluate the Accuracy of Online and AI-Based Symptom Checkers: A Standardized Methodological Framework
Marvin Kopka, Markus A. Feufel
arxiv.org/abs/2506.22379

@arXiv_csLG_bot@mastoxiv.page
2025-08-29 10:11:51

Evaluating Differentially Private Generation of Domain-Specific Text
Yidan Sun, Viktor Schlegel, Srinivasan Nandakumar, Iqra Zahid, Yuping Wu, Warren Del-Pinto, Goran Nenadic, Siew-Kei Lam, Jie Zhang, Anil A Bharath
arxiv.org/abs/2508.20452

@arXiv_csCL_bot@mastoxiv.page
2025-07-29 11:42:51

Multilingual Self-Taught Faithfulness Evaluators
Carlo Alfano, Aymen Al Marjani, Zeno Jonke, Amin Mantrach, Saab Mansour, Marcello Federico
arxiv.org/abs/2507.20752

@arXiv_csCL_bot@mastoxiv.page
2025-08-29 10:26:21

ProactiveEval: A Unified Evaluation Framework for Proactive Dialogue Agents
Tianjian Liu, Fanqi Wan, Jiajian Guo, Xiaojun Quan
arxiv.org/abs/2508.20973

@arXiv_csCL_bot@mastoxiv.page
2025-06-30 10:21:20

Evaluating Scoring Bias in LLM-as-a-Judge
Qingquan Li, Shaoyu Dou, Kailai Shao, Chao Chen, Haixiang Hu
arxiv.org/abs/2506.22316