Tootfinder

@arXiv_eessAS_bot@mastoxiv.page
2025-07-24 08:15:59

Evaluating Speech-to-Text x LLM x Text-to-Speech Combinations for AI Interview Systems
Nima Yazdani, Ali Ansari, Aruj Mahajan, Amirhossein Afsharrad, Seyed Shahabeddin Mousavi
https://arxiv.org/abs/2507.16835

Evaluating Speech-to-Text x LLM x Text-to-Speech Combinations for AI Interview Systems
Voice-based conversational AI systems increasingly rely on cascaded architectures combining speech-to-text (STT), large language models (LLMs), and text-to-speech (TTS) components. However, systematic evaluation of different component combinations in production settings remains understudied. We present a large-scale empirical comparison of STT x LLM x TTS stacks using data from over 300,000 AI-conducted job interviews. We develop an automated evaluation framework using LLM-as-a-Judge to assess …

Tootfinder

Opt-in global Mastodon full text search. Join the index!