Towards LLM-Empowered Fine-Grained Speech Descriptors for Explainable Emotion RecognitionYoujun Chen, Xurong Xie, Haoning Xu, Mengzhe Geng, Guinan Li, Chengxi Deng, Huimeng Wang, Shujie Hu, Xunying Liuhttps://arxiv.org/abs/2505.23236
Towards LLM-Empowered Fine-Grained Speech Descriptors for Explainable Emotion RecognitionThis paper presents a novel end-to-end LLM-empowered explainable speech emotion recognition (SER) approach. Fine-grained speech emotion descriptor (SED) features, e.g., pitch, tone and emphasis, are disentangled from HuBERT SSL representations via alternating LLM fine-tuning to joint SER-SED prediction and ASR tasks. VAE compressed HuBERT features derived via Information Bottleneck (IB) are used to adjust feature granularity. Experiments on the IEMOCAP and MELD benchmarks demonstrate that our a…