VCB Bench: An Evaluation Benchmark for Audio-Grounded Large Language Model Conversational AgentsJiliang Hu, Wenfu Wang, Zuchao Li, Chenxing Li, Yiyang Zhao, Hanzhao Li, Liqiang Zhang, Meng Yu, Dong Yuhttps://arxiv.org/abs/2510.11098
VCB Bench: An Evaluation Benchmark for Audio-Grounded Large Language Model Conversational AgentsRecent advances in large audio language models (LALMs) have greatly enhanced multimodal conversational systems. However, existing benchmarks remain limited -- they are mainly English-centric, rely on synthetic speech, and lack comprehensive, discriminative evaluation across multiple dimensions. To address these gaps, we present Voice Chat Bot Bench (VCB Bench) -- a high-quality Chinese benchmark built entirely on real human speech. VCB Bench evaluates LALMs from three complementary perspectives…