Bills getting defensive reinforcements in hopes to avoid 3-game slide https://www.espn.com/nfl/story/_/id/46700557/buffalo-bills-defensive-reinforcements-michael-hoecht-larry-ogunjob-week-8
Evaluation-Aware Reinforcement Learning
Shripad Vilasrao Deshmukh, Will Schwarzer, Scott Niekum
https://arxiv.org/abs/2509.19464 https://arxiv.org/pdf/2509…
Failure Modes of Maximum Entropy RLHF
\"Omer Veysel \c{C}a\u{g}atan, Bar{\i}\c{s} Akg\"un
https://arxiv.org/abs/2509.20265 https://arxiv.org/pdf/…
Embedding Domain Knowledge for Large Language Models via Reinforcement Learning from Augmented Generation
Chaojun Nie, Jun Zhou, Guanxiang Wang, Shisong Wud, Zichen Wang
https://arxiv.org/abs/2509.20162
From Pheromones to Policies: Reinforcement Learning for Engineered Biological Swarms
Aymeric Vellinger, Nemanja Antonic, Elio Tuci
https://arxiv.org/abs/2509.20095 https://
MARG: MAstering Risky Gap Terrains for Legged Robots with Elevation Mapping
Yinzhao Dong, Ji Ma, Liu Zhao, Wanyue Li, Peng Lu
https://arxiv.org/abs/2509.20036 https://
Learning Robust Penetration-Testing Policies under Partial Observability: A systematic evaluation
Raphael Simon, Pieter Libin, Wim Mees
https://arxiv.org/abs/2509.20008 https://…
A Task-Efficient Reinforcement Learning Task-Motion Planner for Safe Human-Robot Cooperation
Gaoyuan Liu, Joris de Winter, Kelly Merckaert, Denis Steckelmacher, Ann Nowe, Bram Vanderborght
https://arxiv.org/abs/2510.12477
Reinforcing Diffusion Models by Direct Group Preference Optimization
Yihong Luo, Tianyang Hu, Jing Tang
https://arxiv.org/abs/2510.08425 https://arxiv.org/…
Offline Reinforcement Learning with Generative Trajectory Policies
Xinsong Feng, Leshu Tang, Chenan Wang, Haipeng Chen
https://arxiv.org/abs/2510.11499 https://