OdysseyBench: Evaluating LLM Agents on Long-Horizon Complex Office Application Workflows
Weixuan Wang, Dongge Han, Daniel Madrigal Diaz, Jin Xu, Victor R\"uhle, Saravan Rajmohan
https://arxiv.org/abs/2508.09124
ODYSSEY: Open-World Quadrupeds Exploration and Manipulation for Long-Horizon Tasks
Kaijun Wang, Liqin Lu, Mingyu Liu, Jianuo Jiang, Zeju Li, Bolin Zhang, Wancai Zheng, Xinyi Yu, Hao Chen, Chunhua Shen
https://arxiv.org/abs/2508.08240
Mediator-Guided Multi-Agent Collaboration among Open-Source Models for Medical Decision-Making
Kaitao Chen, Mianxin Liu, Daoming Zong, Chaoyue Ding, Shaohao Rui, Yankai Jiang, Mu Zhou, Xiaosong Wang
https://arxiv.org/abs/2508.05996
Testing new-physics scenarios with the combined LHAASO and Carpet-3 fluence spectrum of GRB 221009A: axion-like particles and Lorentz-invariance violation
P. S. Satunin, S. V. Troitsky
https://arxiv.org/abs/2510.07234
ScamAgents: How AI Agents Can Simulate Human-Level Scam Calls
Sanket Badhe
https://arxiv.org/abs/2508.06457 https://arxiv.org/pdf/2508.06457
SciVideoBench: Benchmarking Scientific Video Reasoning in Large Multimodal Models
Andong Deng, Taojiannan Yang, Shoubin Yu, Lincoln Spencer, Mohit Bansal, Chen Chen, Serena Yeung-Levy, Xiaohan Wang
https://arxiv.org/abs/2510.08559
Scaling LLM Multi-turn RL with End-to-end Summarization-based Context Management
Miao Lu, Weiwei Sun, Weihua Du, Zhan Ling, Xuesong Yao, Kang Liu, Jiecao Chen
https://arxiv.org/abs/2510.06727
Just finished "Sinners".
Have a feeling it's going to stay with me for a while. Anyone who says "cinema" is dying is digging in the wrong place.
https://www.themoviedb.org/movie/1233413-sinners
R-Horizon: How Far Can Your Large Reasoning Model Really Go in Breadth and Depth?
Yi Lu, Jianing Wang, Linsen Guo, Wei He, Hongyin Tang, Tao Gui, Xuanjing Huang, Xuezhi Cao, Wei Wang, Xunliang Cai
https://arxiv.org/abs/2510.08189
DACIP-RC: Domain Adaptive Continual Instruction Pre-Training via Reading Comprehension on Business Conversations
Elena Khasanova, Harsh Saini, Md Tahmid Rahman Laskar, Xue-Yong Fu, Cheng Chen, Shashi Bhushan TN
https://arxiv.org/abs/2510.08152