CMI-Bench: A Comprehensive Benchmark for Evaluating Music Instruction Following
Yinghao Ma, Siyou Li, Juntao Yu, Emmanouil Benetos, Akira Maezawa
https://arxiv.org/abs/2506.12285 …
Evaluating Reinforcement Learning Algorithms for Navigation in Simulated Robotic Quadrupeds: A Comparative Study Inspired by Guide Dog Behaviour
Emma M. A. Harrison
https://arxiv.org/abs/2507.13277
Hatevolution: What Static Benchmarks Don't Tell Us
Chiara Di Bonaventura, Barbara McGillivray, Yulan He, Albert Mero\~no-Pe\~nuela
https://arxiv.org/abs/2506.12148
From Flat to Feeling: A Feasibility and Impact Study on Dynamic Facial Emotions in AI-Generated Avatars
Pegah Salehi, Sajad Amouei Sheshkal, Vajira Thambawita, P{\aa}l Halvorsen
https://arxiv.org/abs/2506.13477
Balancing Preservation and Modification: A Region and Semantic Aware Metric for Instruction-Based Image Editing
Zhuoying Li, Zhu Xu, Yuxin Peng, Yang Liu
https://arxiv.org/abs/2506.13827
Improving Surgical Risk Prediction Through Integrating Automated Body Composition Analysis: a Retrospective Trial on Colectomy Surgery
Hanxue Gu, Yaqian Chen, isoo Lee, Diego Schaps, Regina Woody, Roy Colglazier, Maciej A. Mazurowski, Christopher Mantyh
https://arxiv.org/abs/2506.11996
How Many Instructions Can LLMs Follow at Once?
Daniel Jaroslawicz, Brendan Whiting, Parth Shah, Karime Maamari
https://arxiv.org/abs/2507.11538 https://
How Warm-Glow Alters the Usability of Technology
Antonios Saravanos (New York University)
https://arxiv.org/abs/2506.14720 https://ar…
A Comparative Approach to Assessing Linguistic Creativity of Large Language Models and Humans
Anca Dinu, Andra-Maria Florescu, Alina Resceanu
https://arxiv.org/abs/2507.12039
AEGIS: Authenticity Evaluation Benchmark for AI-Generated Video Sequences
Jieyu Li, Xin Zhang, Joey Tianyi Zhou
https://arxiv.org/abs/2508.10771 https://ar…