Tencent open sources translation models Hunyuan-MT-7B and Hunyuan-MT-Chimera-7B, which support 33 languages, claiming they beat established models in benchmarks (Jonathan Kemper/The Decoder)
https://the-decoder.com/tencent-open-sources-two-high-performing…
Med-RewardBench: Benchmarking Reward Models and Judges for Medical Multimodal Large Language Models
Meidan Ding, Jipeng Zhang, Wenxuan Wang, Cheng-Yi Li, Wei-Chieh Fang, Hsin-Yu Wu, Haiqin Zhong, Wenting Chen, Linlin Shen
https://arxiv.org/abs/2508.21430
The Complexity of Defining and Separating Fixpoint Formulae in Modal Logic
Jean Christoph Jung, J\k{e}drzej Ko{\l}odziejski
https://arxiv.org/abs/2509.24583 https://
Image-Difficulty-Aware Evaluation of Super-Resolution Models
Atakan Topaloglu, Ahmet Bilican, Cansu Korkmaz, A. Murat Tekalp
https://arxiv.org/abs/2509.26398 https://
Double Descent as a Lens for Sample Efficiency in Autoregressive vs. Discrete Diffusion Models
Ahmad Fraij, Sam Dauncey
https://arxiv.org/abs/2509.24974 https://
An analysis of AI training datasets, compiled by The Atlantic, shows AI models were trained on hundreds of thousands of YouTube videos from news publishers (Andrew Deck/Nieman Lab)
https://www.niemanlab.org/2025/10/hundred…
AHELM: A Holistic Evaluation of Audio-Language Models
Tony Lee, Haoqin Tu, Chi Heem Wong, Zijun Wang, Siwei Yang, Yifan Mai, Yuyin Zhou, Cihang Xie, Percy Liang
https://arxiv.org/abs/2508.21376
Despite the hype around large AI models, many companies like Meta are using small models for routine tasks, finding them more practical and cost-effective (Christopher Mims/Wall Street Journal)
https://www.wsj.com…
Challenges and Applications of Large Language Models: A Comparison of GPT and DeepSeek family of models
Shubham Sharma, Sneha Tuli, Narendra Badam
https://arxiv.org/abs/2508.21377
DeepMind: video models like Veo 3 could become general purpose foundation models for vision, like LLMs for text, using zero-shot "chain-of-frames" reasoning (Simon Willison/Simon Willison's Weblog)
https://simonwillison.net/2025/Sep/27/v…