Tootfinder

Opt-in global Mastodon full text search. Join the index!

No exact results. Similar results found.
@arXiv_csAI_bot@mastoxiv.page
2025-06-03 07:22:12

AgentAuditor: Human-Level Safety and Security Evaluation for LLM Agents
Hanjun Luo, Shenyu Dai, Chiming Ni, Xinfeng Li, Guibin Zhang, Kun Wang, Tongliang Liu, Hanan Salam
arxiv.org/abs/2506.00641

@arXiv_csIR_bot@mastoxiv.page
2025-06-03 07:26:37

FACE: A Fine-grained Reference Free Evaluator for Conversational Recommender Systems
Hideaki Joko, Faegheh Hasibi
arxiv.org/abs/2506.00314

@arXiv_csCL_bot@mastoxiv.page
2025-06-03 08:20:50

From Guidelines to Practice: A New Paradigm for Arabic Language Model Evaluation
Serry Sibaee, Omer Nacar, Adel Ammar, Yasser Al-Habashi, Abdulrahman Al-Batati, Wadii Boulila
arxiv.org/abs/2506.01920

@arXiv_csRO_bot@mastoxiv.page
2025-06-03 07:58:41

Evaluating Robot Policies in a World Model
Julian Quevedo, Percy Liang, Sherry Yang
arxiv.org/abs/2506.00613 arxiv.or…

@arXiv_csNE_bot@mastoxiv.page
2025-06-03 07:22:24

Regionalized Metric Framework: A Novel Approach for Evaluating Multimodal Multi-Objective Optimization Algorithms
Jintai Chen, Fangqing Liu, Xueming Yan, Han Huang
arxiv.org/abs/2506.00468

@arXiv_csCL_bot@mastoxiv.page
2025-06-03 08:20:33

CiteEval: Principle-Driven Citation Evaluation for Source Attribution
Yumo Xu, Peng Qi, Jifan Chen, Kunlun Liu, Rujun Han, Lan Liu, Bonan Min, Vittorio Castelli, Arshit Gupta, Zhiguo Wang
arxiv.org/abs/2506.01829

@arXiv_csCL_bot@mastoxiv.page
2025-06-03 08:20:07

MaXIFE: Multilingual and Cross-lingual Instruction Following Evaluation
Yile Liu, Ziwei Ma, Xiu Jiang, Jinglu Hu, Jing Chang, Liang Li
arxiv.org/abs/2506.01776

@arXiv_csCL_bot@mastoxiv.page
2025-06-03 08:20:41

CONFETTI: Conversational Function-Calling Evaluation Through Turn-Level Interactions
Tamer Alkhouli, Katerina Margatina, James Gung, Raphael Shu, Claudia Zaghi, Monica Sunkara, Yi Zhang
arxiv.org/abs/2506.01859

@arXiv_csCL_bot@mastoxiv.page
2025-06-03 08:20:12

Human-Centric Evaluation for Foundation Models
Yijin Guo, Kaiyuan Ji, Xiaorong Zhu, Junying Wang, Farong Wen, Chunyi Li, Zicheng Zhang, Guangtao Zhai
arxiv.org/abs/2506.01793

@arXiv_csCL_bot@mastoxiv.page
2025-06-03 08:20:56

RewardBench 2: Advancing Reward Model Evaluation
Saumya Malik, Valentina Pyatkin, Sander Land, Jacob Morrison, Noah A. Smith, Hannaneh Hajishirzi, Nathan Lambert
arxiv.org/abs/2506.01937