Reinforcement Learning Optimization for Large-Scale Learning: An Efficient and User-Friendly Scaling Library
Weixun Wang, Shaopan Xiong, Gengru Chen, Wei Gao, Sheng Guo, Yancheng He, Ju Huang, Jiaheng Liu, Zhendong Li, Xiaoyang Li, Zichen Liu, Haizhou Zhao, Dakai An, Lunxi Cao, Qiyang Cao, Wanxi Deng, Feilei Du, Yiliang Gu, Jiahe Li, Xiang Li, Mingjie Liu, Yijia Luo, Zihe Liu, Yadao Wang, Pei Wang, Tianyuan Wu, Yanan Wu, Yuheng Zhao, Shuaibing Zhao, Jin Yang, Siran Yang, Yingshui Tan, …
Improving Long-Range Navigation with Spatially-Enhanced Recurrent Memory via End-to-End Reinforcement Learning
Fan Yang, Per Frivik, David Hoeller, Chen Wang, Cesar Cadena, Marco Hutter
https://arxiv.org/abs/2506.05997
How to craft a deep reinforcement learning policy for wind farm flow control
Elie Kadoche, Pascal Bianchi, Florence Carton, Philippe Ciblat, Damien Ernst
https://arxiv.org/abs/2506.06204
Deep Reinforcement Learning for Investor-Specific Portfolio Optimization: A Volatility-Guided Asset Selection Approach
Arishi Orra, Aryan Bhambu, Himanshu Choudhary, Manoj Thakur, Selvaraju Natarajan
https://arxiv.org/abs/2505.03760
Sequence Modeling for N-Agent Ad Hoc Teamwork
Caroline Wang, Di Yang Shi, Elad Liebman, Ishan Durugkar, Arrasy Rahman, Peter Stone
https://arxiv.org/abs/2506.05527
Delighted to see these two new papers come out in Nature (they've been on bioRxiv for a while).
How does Pavlov's dog learn that the bell predicts the food? One answer is that the bell appears ``close'' in time to the food and that enables learning. We're certain that dopamine has something to do with learning these kinds of associations. But the definition of ``close'' in time is actually really difficult to pin down. You can get associations over prett…
Regret-Optimal Q-Learning with Low Cost for Single-Agent and Federated Reinforcement Learning
Haochen Zhang, Zhong Zheng, Lingzhou Xue
https://arxiv.org/abs/2506.04626
Another of my forays into AI ethics is just out! This time the focus is on the ethics (or lack thereof) of Reinforcement Learning Feedback (RLF) techniques aimed at increasing the 'alignment' of LLMs.
The paper is fruit of the joint work of a great team of collaborators, among whom @… and @…
Table-r1: Self-supervised and Reinforcement Learning for Program-based Table Reasoning in Small Language Models
Rihui Jin, Zheyu Xin, Xing Xie, Zuoyi Li, Guilin Qi, Yongrui Chen, Xinbang Dai, Tongtong Wu, Gholamreza Haffari
https://arxiv.org/abs/2506.06137

Table-r1: Self-supervised and Reinforcement Learning for Program-based Table Reasoning in Small Language Models
Table reasoning (TR) requires structured reasoning over semi-structured tabular data and remains challenging, particularly for small language models (SLMs, e.g., LLaMA-8B) due to their limited capacity compared to large LMs (LLMs, e.g., GPT-4o). To narrow this gap, we explore program-based TR (P-TR), which circumvents key limitations of text-based TR (T-TR), notably in numerical reasoning, by generating executable programs. However, applying P-TR to SLMs introduces two challenges: (i) vulnerabi…
Boosting Open-Source LLMs for Program Repair via Reasoning Transfer and LLM-Guided Reinforcement Learning
Xunzhu Tang, Jacques Klein, Tegawend\'e F. Bissyand\'e
https://arxiv.org/abs/2506.03921
Pearl: Automatic Code Optimization Using Deep Reinforcement Learning
Djamel Rassem Lamouri, Iheb Nassim Aouadj, Smail Kourta, Riyadh Baghdadi
https://arxiv.org/abs/2506.01880
BASIL: Best-Action Symbolic Interpretable Learning for Evolving Compact RL Policies
Kourosh Shahnazari, Seyed Moein Ayyoubzadeh, Mohammadali Keshtparvar
https://arxiv.org/abs/2506.00328
Reasoning-Table: Exploring Reinforcement Learning for Table Reasoning
Fangyu Lei, Jinxiang Meng, Yiming Huang, Tinghong Chen, Yun Zhang, Shizhu He, Jun Zhao, Kang Liu
https://arxiv.org/abs/2506.01710
Modeling human reputation-seeking behavior in a spatio-temporally complex public good provision game
Edward Hughes, Tina O. Zhu, Martin J. Chadwick, Raphael Koster, Antonio Garc\'ia Casta\~neda, Charles Beattie, Thore Graepel, Matthew M. Botvinick, Joel Z. Leibo
https://arxiv.org/abs/2506.06032…
Crowd-SFT: Crowdsourcing for LLM Alignment
Alex Sotiropoulos, Sulyab Thottungal Valapu, Linus Lei, Jared Coleman, Bhaskar Krishnamachari
https://arxiv.org/abs/2506.04063
A Novel Deep Reinforcement Learning Method for Computation Offloading in Multi-User Mobile Edge Computing with Decentralization
Nguyen Chi Long, Trinh Van Chien, Ta Hai Tung, Van Son Nguyen, Trong-Minh Hoang, Nguyen Ngoc Hai Dang
https://arxiv.org/abs/2506.02458
Interpretable reinforcement learning for heat pump control through asymmetric differentiable decision trees
Toon Van Puyvelde, Mehran Zareh, Chris Develder
https://arxiv.org/abs/2506.01641
Discounting and Drug Seeking in Biological Hierarchical Reinforcement Learning
Vardhan Palod, Pranav Mahajan, Veeky Baths, Boris S. Gutkin
https://arxiv.org/abs/2506.04549
AMPED: Adaptive Multi-objective Projection for balancing Exploration and skill Diversification
Geonwoo Cho, Jaemoon Lee, Jaegyun Im, Subi Lee, Jihwan Lee, Sundong Kim
https://arxiv.org/abs/2506.05980
Für die breite Verwendung von #KI, speziell im Kontext #Schule, muss sichergestellt sein, dass #LLMs user:innen nicht zu selbstgefährdendem Verhalten animieren.
Das Nonprofit Transluce arbeitet an verschie…
A Reinforcement Learning-Based Telematic Routing Protocol for the Internet of Underwater Things
Mohammadhossein Homaei, Mehran Tarif, Agustin Di Bartolo, Oscar Mogollon Gutierrez, Mar Avila
https://arxiv.org/abs/2506.00133
CRScore : Reinforcement Learning with Verifiable Tool and AI Feedback for Code Review
Manav Nitin Kapadnis, Atharva Naik, Carolyn Rose
https://arxiv.org/abs/2506.00296
On-board Mission Replanning for Adaptive Cooperative Multi-Robot Systems
Elim Kwan, Rehman Qureshi, Liam Fletcher, Colin Laganier, Victoria Nockles, Richard Walters
https://arxiv.org/abs/2506.06094
Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning
Shenzhi Wang, Le Yu, Chang Gao, Chujie Zheng, Shixuan Liu, Rui Lu, Kai Dang, Xionghui Chen, Jianxin Yang, Zhenru Zhang, Yuqiong Liu, An Yang, Andrew Zhao, Yang Yue, Shiji Song, Bowen Yu, Gao Huang, Junyang Lin
https://arx…
Jailbreak-R1: Exploring the Jailbreak Capabilities of LLMs via Reinforcement Learning
Weiyang Guo, Zesheng Shi, Zhuo Li, Yequan Wang, Xuebo Liu, Wenya Wang, Fangming Liu, Min Zhang, Jing Li
https://arxiv.org/abs/2506.00782
Learning optimal treatment strategies for intraoperative hypotension using deep reinforcement learning
Esra Adiyeke, Tianqi Liu, Venkata Sai Dheeraj Naganaboina, Han Li, Tyler J. Loftus, Yuanfang Ren, Benjamin Shickel, Matthew M. Ruppert, Karandeep Singh, Ruogu Fang, Parisa Rashidi, Azra Bihorac, Tezcan Ozrazgat-Baslanti
https://
Maximizing the Promptness of Metaverse Systems using Edge Computing by Deep Reinforcement Learning
Tam Ninh Thi-Thanh, Trinh Van Chien, Hung Tran, Nguyen Hoai Son, Van Nhan Vo
https://arxiv.org/abs/2506.02657
Reusing Trajectories in Policy Gradients Enables Fast Convergence
Alessandro Montenegro, Federico Mansutti, Marco Mussi, Matteo Papini, Alberto Maria Metelli
https://arxiv.org/abs/2506.06178
Towards Language-Augmented Multi-Agent Deep Reinforcement Learning
Maxime Toquebiau, Jae-Yun Jun, Fa\"iz Benamar, Nicolas Bredeche
https://arxiv.org/abs/2506.05236
ProRank: Prompt Warmup via Reinforcement Learning for Small Language Models Reranking
Xianming Li, Aamir Shakir, Rui Huang, Julius Lipp, Jing Li
https://arxiv.org/abs/2506.03487
HMPC-assisted Adversarial Inverse Reinforcement Learning for Smart Home Energy Management
Jiadong He, Liang Yu, Zhiqiang Chen, Dawei Qiu, Dong Yue, Goran Strbac, Meng Zhang, Yujian Ye, Yi Wang
https://arxiv.org/abs/2506.00898
A Theoretical Study of (Hyper) Self-Attention through the Lens of Interactions: Representation, Training, Generalization
Muhammed Ustaomeroglu, Guannan Qu
https://arxiv.org/abs/2506.06179
Autonomous Vehicle Lateral Control Using Deep Reinforcement Learning with MPC-PID Demonstration
Chengdong Wu, Sven Kirchner, Nils Purschke, Alois C. Knoll
https://arxiv.org/abs/2506.04040
Federated Deep Reinforcement Learning-Driven O-RAN for Automatic Multirobot Reconfiguration
Faisal Ahmed, Myungjin Lee, Shao-Yu Lien, Suresh Subramaniam, Motoharu Matsuura, Hiroshi Hasegawa, Shih-Chun Lin
https://arxiv.org/abs/2506.00822
SRPO: Enhancing Multimodal LLM Reasoning via Reflection-Aware Reinforcement Learning
Zhongwei Wan, Zhihao Dou, Che Liu, Yu Zhang, Dongfei Cui, Qinjian Zhao, Hui Shen, Jing Xiong, Yi Xin, Yifan Jiang, Yangfan He, Mi Zhang, Shen Yan
https://arxiv.org/abs/2506.01713
Reinforcement Learning with Data Bootstrapping for Dynamic Subgoal Pursuit in Humanoid Robot Navigation
Chengyang Peng, Zhihao Zhang, Shiting Gong, Sankalp Agrawal, Keith A. Redmill, Ayonga Hereid
https://arxiv.org/abs/2506.02206
Fine-tuning for Data-enabled Predictive Control of Noisy Systems by Reinforcement Learning
Jinbao Wang, Shiliang Zhang, Jun Liu, Xuehui Ma, Haolin Liu
https://arxiv.org/abs/2505.24572
Afterburner: Reinforcement Learning Facilitates Self-Improving Code Efficiency Optimization
Mingzhe Du, Luu Tuan Tuan, Yue Liu, Yuhao Qing, Dong Huang, Xinyi He, Qian Liu, Zejun Ma, See-kiong Ng
https://arxiv.org/abs/2505.23387
Sorrel: A simple and flexible framework for multi-agent reinforcement learning
Rebekah A. Gelp\'i, Yibing Ju, Ethan C. Jackson, Yikai Tang, Shon Verch, Claas Voelcker, William A. Cunningham
https://arxiv.org/abs/2506.00228
LAMARL: LLM-Aided Multi-Agent Reinforcement Learning for Cooperative Policy Generation
Guobin Zhu, Rui Zhou, Wenkang Ji, Shiyu Zhao
https://arxiv.org/abs/2506.01538
Robust and Safe Multi-Agent Reinforcement Learning Framework with Communication for Autonomous Vehicles
Keshawn Smith, Zhili Zhang, H M Sabbir Ahmad, Ehsan Sabouni, Maniak Mondal, Song Han, Wenchao Li, Fei Miao
https://arxiv.org/abs/2506.00982
R3DM: Enabling Role Discovery and Diversity Through Dynamics Models in Multi-agent Reinforcement Learning
Harsh Goel, Mohammad Omama, Behdad Chalaki, Vaishnav Tadiparthi, Ehsan Moradi Pari, Sandeep Chinchali
https://arxiv.org/abs/2505.24265
DriveMind: A Dual-VLM based Reinforcement Learning Framework for Autonomous Driving
Dawood Wasif, Terrence J Moore, Chandan K Reddy, Jin-Hee Cho
https://arxiv.org/abs/2506.00819
Distributed Neural Policy Gradient Algorithm for Global Convergence of Networked Multi-Agent Reinforcement Learning
Pengcheng Dai, Yuanqiu Mo, Wenwu Yu, Wei Ren
https://arxiv.org/abs/2505.24113
EDEN: Entorhinal Driven Egocentric Navigation Toward Robotic Deployment
Mikolaj Walczak, Romina Aalishah, Wyatt Mackey, Brittany Story, David L. Boothe Jr., Nicholas Waytowich, Xiaomin Lin, Tinoosh Mohsenin
https://arxiv.org/abs/2506.03046
A Hierarchical Bin Packing Framework with Dual Manipulators via Heuristic Search and Deep Reinforcement Learning
Beomjoon Lee, Changjoo Nam
https://arxiv.org/abs/2506.01628
Reactive Aerobatic Flight via Reinforcement Learning
Zhichao Han, Xijie Huang, Zhuxiu Xu, Jiarui Zhang, Yuze Wu, Mingyang Wang, Tianyue Wu, Fei Gao
https://arxiv.org/abs/2505.24396
Disturbance-Aware Adaptive Compensation in Hybrid Force-Position Locomotion Policy for Legged Robots
Yang Zhang, Buqing Nie, Zhanxiang Cao, Yangqing Fu, Yue Gao
https://arxiv.org/abs/2506.00472