ERA: Transforming VLMs into Embodied Agents via Embodied Prior Learning and Online Reinforcement Learning
Hanyang Chen, Mark Zhao, Rui Yang, Qinwei Ma, Ke Yang, Jiarui Yao, Kangrui Wang, Hao Bai, Zhenhailong Wang, Rui Pan, Mengchao Zhang, Jose Barreiros, Aykut Onol, ChengXiang Zhai, Heng Ji, Manling Li, Huan Zhang, Tong Zhang
https://arxiv…
Q&A with Andrej Karpathy on AGI still being a decade away, why reinforcement learning is terrible, superintelligence, his AI education startup Eureka, and more (Dwarkesh Patel/Dwarkesh Podcast)
https://www.dwarkesh.com/p/andrej-karpathy
CoIRL-AD: Collaborative-Competitive Imitation-Reinforcement Learning in Latent World Models for Autonomous Driving
Xiaoji Zheng, Ziyuan Yang, Yanhao Chen, Yuhang Peng, Yuanrong Tang, Gengyuan Liu, Bokui Chen, Jiangtao Gong
https://arxiv.org/abs/2510.12560
Stabilizing MoE Reinforcement Learning by Aligning Training and Inference Routers
Wenhan Ma, Hailin Zhang, Liang Zhao, Yifan Song, Yudong Wang, Zhifang Sui, Fuli Luo
https://arxiv.org/abs/2510.11370
Physics-Informed Reinforcement Learning for Large-Scale EV Smart Charging Considering Distribution Network Voltage Constraints
Stavros Orfanoudakis, Frans Oliehoek, Peter Palesnky, Pedro P. Vergara
https://arxiv.org/abs/2510.12335
Empirical Study on Robustness and Resilience in Cooperative Multi-Agent Reinforcement Learning
Simin Li, Zihao Mao, Hanxiao Li, Zonglei Jing, Zhuohang bian, Jun Guo, Li Wang, Zhuoran Han, Ruixiao Xu, Xin Yu, Chengdong Ma, Yuqing Ma, Bo An, Yaodong Yang, Weifeng Lv, Xianglong Liu
https://arxiv.org/abs/2510.11824
Inclusive Fitness as a Key Step Towards More Advanced Social Behaviors in Multi-Agent Reinforcement Learning Settings
Andries Rosseau, Rapha\"el Avalos, Ann Now\'e
https://arxiv.org/abs/2510.12555
A Flexible Multi-Agent Deep Reinforcement Learning Framework for Dynamic Routing and Scheduling of Latency-Critical Services
Vincenzo Norman Vitale, Antonia Maria Tulino, Andreas F. Molisch, Jaime Llorca
https://arxiv.org/abs/2510.11535
Spotlight on Token Perception for Multimodal Reinforcement Learning
Siyuan Huang, Xiaoye Qu, Yafu Li, Yun Luo, Zefeng He, Daizong Liu, Yu Cheng
https://arxiv.org/abs/2510.09285 …
Meta-learning three-factor plasticity rules for structured credit assignment with sparse feedback
Dimitra Maoutsa
https://arxiv.org/abs/2512.09366 https://arxiv.org/pdf/2512.09366 https://arxiv.org/html/2512.09366
arXiv:2512.09366v1 Announce Type: new
Abstract: Biological neural networks learn complex behaviors from sparse, delayed feedback using local synaptic plasticity, yet the mechanisms enabling structured credit assignment remain elusive. In contrast, artificial recurrent networks solving similar tasks typically rely on biologically implausible global learning rules or hand-crafted local updates. The space of local plasticity rules capable of supporting learning from delayed reinforcement remains largely unexplored. Here, we present a meta-learning framework that discovers local learning rules for structured credit assignment in recurrent networks trained with sparse feedback. Our approach interleaves local neo-Hebbian-like updates during task execution with an outer loop that optimizes plasticity parameters via \textbf{tangent-propagation through learning}. The resulting three-factor learning rules enable long-timescale credit assignment using only local information and delayed rewards, offering new insights into biologically grounded mechanisms for learning in recurrent circuits.
toXiv_bot_toot
$\mathbf{T^3}$: Reducing Belief Deviation in Reinforcement Learning for Active Reasoning
Deyu Zou, Yongqiang Chen, Jianxiang Wang, Haochen Yang, Mufei Li, James Cheng, Pan Li, Yu Gong
https://arxiv.org/abs/2510.12264
A Task-Efficient Reinforcement Learning Task-Motion Planner for Safe Human-Robot Cooperation
Gaoyuan Liu, Joris de Winter, Kelly Merckaert, Denis Steckelmacher, Ann Nowe, Bram Vanderborght
https://arxiv.org/abs/2510.12477
PyCFRL: A Python library for counterfactually fair offline reinforcement learning via sequential data preprocessing
Jianhan Zhang, Jitao Wang, Chengchun Shi, John D. Piette, Donglin Zeng, Zhenke Wu
https://arxiv.org/abs/2510.06935
Reinforced Preference Optimization for Recommendation
Junfei Tan, Yuxin Chen, An Zhang, Junguang Jiang, Bin Liu, Ziru Xu, Han Zhu, Jian Xu, Bo Zheng, Xiang Wang
https://arxiv.org/abs/2510.12211
Utilizing Model-Free Reinforcement Learning for Optimizing Secure Multi-Party Computation Protocols
Javad Sayyadi, Mahdi Nangir, Mahmood Mohassel Feghhi, Hamid Sayyadi
https://arxiv.org/abs/2510.07814 …
A Physics-Informed Reinforcement Learning Approach for Degradation-Aware Long-Term Charging Optimization in Batteries
Shanthan Kumar Padisala, Bharatkumar Hegde, Ibrahim Haskara, Satadru Dey
https://arxiv.org/abs/2510.11515
QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs
Wei Huang, Yi Ge, Shuai Yang, Yicheng Xiao, Huizi Mao, Yujun Lin, Hanrong Ye, Sifei Liu, Ka Chun Cheung, Hongxu Yin, Yao Lu, Xiaojuan Qi, Song Han, Yukang Chen
https://arxiv.org/abs/2510.11696
Autonomous vehicles need social awareness to find optima in multi-agent reinforcement learning routing games
Anastasia Psarou, {\L}ukasz Gorczyca, Dominik Gawe{\l}, Rafa{\l} Kucharski
https://arxiv.org/abs/2510.11410
Autonomous Legged Mobile Manipulation for Lunar Surface Operations via Constrained Reinforcement Learning
Alvaro Belmonte-Baeza, Miguel Cazorla, Gabriel J. Garc\'ia, Carlos J. P\'erez-Del-Pulgar, Jorge Pomares
https://arxiv.org/abs/2510.12684
TTRV: Test-Time Reinforcement Learning for Vision Language Models
Akshit Singh, Shyam Marjit, Wei Lin, Paul Gavrikov, Serena Yeung-Levy, Hilde Kuehne, Rogerio Feris, Sivan Doveh, James Glass, M. Jehanzeb Mirza
https://arxiv.org/abs/2510.06783
Replaced article(s) found for cs.CL. https://arxiv.org/list/cs.CL/new
[2/5]:
- DoctorAgent-RL: A Multi-Agent Collaborative Reinforcement Learning System for Multi-Turn Clinical...
Yichun Feng, Jiawei Wang, Lu Zhou, Zhen Lei, Yixue Li
Obstacle Avoidance using Dynamic Movement Primitives and Reinforcement Learning
Dominik Urbaniak, Alejandro Agostini, Pol Ramon, Jan Rosell, Ra\'ul Su\'arez, Michael Suppa
https://arxiv.org/abs/2510.09254
No MoCap Needed: Post-Training Motion Diffusion Models with Reinforcement Learning using Only Textual Prompts
Girolamo Macaluso, Lorenzo Mandelli, Mirko Bicchierai, Stefano Berretti, Andrew D. Bagdanov
https://arxiv.org/abs/2510.06988
Falsification-Driven Reinforcement Learning for Maritime Motion Planning
Marlon M\"uller, Florian Finkeldei, Hanna Krasowski, Murat Arcak, Matthias Althoff
https://arxiv.org/abs/2510.06970
L2M-AID: Autonomous Cyber-Physical Defense by Fusing Semantic Reasoning of Large Language Models with Multi-Agent Reinforcement Learning (Preprint)
Tianxiang Xu, Zhichao Wen, Xinyu Zhao, Jun Wang, Yan Li, Chang Liu
https://arxiv.org/abs/2510.07363
FM-IRL: Flow-Matching for Reward Modeling and Policy Regularization in Reinforcement Learning
Zhenglin Wan, Jingxuan Wu, Xingrui Yu, Chubin Zhang, Mingcong Lei, Bo An, Ivor Tsang
https://arxiv.org/abs/2510.09222
Gaze on the Prize: Shaping Visual Attention with Return-Guided Contrastive Learning
Andrew Lee, Ian Chuang, Dechen Gao, Kai Fukazawa, Iman Soltani
https://arxiv.org/abs/2510.08442
From Learning to Mastery: Achieving Safe and Efficient Real-World Autonomous Driving with Human-In-The-Loop Reinforcement Learning
Li Zeqiao, Wang Yijing, Wang Haoyu, Li Zheng, Li Peng, Liu Wenfei, Zuo Zhiqiang
https://arxiv.org/abs/2510.06038
Crosslisted article(s) found for cs.CL. https://arxiv.org/list/cs.CL/new
[3/3]:
- QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs
Huang, Ge, Yang, Xiao, Mao, Lin, Ye, Liu, Cheung, Yin, Lu, Qi, Han, Chen
Replaced article(s) found for cs.LG. https://arxiv.org/list/cs.LG/new
[3/14]:
- OrbitZoo: Multi-Agent Reinforcement Learning Environment for Orbital Dynamics
Alexandre Oliveira, Katarina Dyreby, Francisco Caldas, Cl\'audia Soares
Laminar: A Scalable Asynchronous RL Post-Training Framework
Guangming Sheng, Yuxuan Tong, Borui Wan, Wang Zhang, Chaobo Jia, Xibin Wu, Yuqi Wu, Xiang Li, Chi Zhang, Yanghua Peng, Haibin Lin, Xin Liu, Chuan Wu
https://arxiv.org/abs/2510.12633
From <Answer> to <Think>: Multidimensional Supervision of Reasoning Process for LLM Optimization
Beining Wang, Weihang Su, Hongtao Tian, Tao Yang, Yujia Zhou, Ting Yao, Qingyao Ai, Yiqun Liu
https://arxiv.org/abs/2510.11457
Phys2Real: Fusing VLM Priors with Interactive Online Adaptation for Uncertainty-Aware Sim-to-Real Manipulation
Maggie Wang, Stephen Tian, Aiden Swann, Ola Shorinwa, Jiajun Wu, Mac Schwager
https://arxiv.org/abs/2510.11689
TaoSR-SHE: Stepwise Hybrid Examination Reinforcement Learning Framework for E-commerce Search Relevance
Pengkun Jiao, Yiming Jin, Jianhui Yang, Chenhe Dong, Zerui Huang, Shaowei Yao, Xiaojiang Zhou, Dan Ou, Haihong Tang
https://arxiv.org/abs/2510.07972
Reinforcement Learning from Probabilistic Forecasts for Safe Decision-Making via Conditional Value-at-Risk Planning
Michal Koren, Or Peretz, Tai Dinh, Philip S. Yu
https://arxiv.org/abs/2510.08226
Convergence Theorems for Entropy-Regularized and Distributional Reinforcement Learning
Yash Jhaveri, Harley Wiltzer, Patrick Shafto, Marc G. Bellemare, David Meger
https://arxiv.org/abs/2510.08526
Reflection-Based Task Adaptation for Self-Improving VLA
Baicheng Li, Dong Wu, Zike Yan, Xinchen Liu, Zecui Zeng, Lusong Li, Hongbin Zha
https://arxiv.org/abs/2510.12710 https://…
Agent Learning via Early Experience
Kai Zhang, Xiangchao Chen, Bo Liu, Tianci Xue, Zeyi Liao, Zhihan Liu, Xiyao Wang, Yuting Ning, Zhaorun Chen, Xiaohan Fu, Jian Xie, Yuxuan Sun, Boyu Gou, Qi Qi, Zihang Meng, Jianwei Yang, Ning Zhang, Xian Li, Ashish Shah, Dat Huynh, Hengduo Li, Zi Yang, Sara Cao, Lawrence Jang, Shuyan Zhou, Jiacheng Zhu, Huan Sun, Jason Weston, Yu Su, Yifan Wu
DeepEN: Personalized Enteral Nutrition for Critically Ill Patients using Deep Reinforcement Learning
Daniel Jason Tan, Jiayang Chen, Dilruk Perera, Kay Choong See, Mengling Feng
https://arxiv.org/abs/2510.08350
A Benchmark Study of Deep Reinforcement Learning Algorithms for the Container Stowage Planning Problem
Yunqi Huang, Nishith Chennakeshava, Alexis Carras, Vladislav Neverov, Wei Liu, Aske Plaat, Yingjie Fan
https://arxiv.org/abs/2510.02589
Crosslisted article(s) found for cs.LG. https://arxiv.org/list/cs.LG/new
[4/5]:
- Application of Deep Reinforcement Learning to At-the-Money S&P 500 Options Hedging
Zofia Bracha, Pawe{\l} Sakowski, Jakub Micha\'nk\'ow
Tool-Augmented Policy Optimization: Synergizing Reasoning and Adaptive Tool Use with Reinforcement Learning
Wenxun Wu, Yuanyang Li, Guhan Chen, Linyue Wang, Hongyang Chen
https://arxiv.org/abs/2510.07038
Replaced article(s) found for cs.LG. https://arxiv.org/list/cs.LG/new
[4/8]:
- Latent Variable Modeling in Multi-Agent Reinforcement Learning via Expectation-Maximization for U...
Mazyar Taghavi, Rahman Farnoosh
Replaced article(s) found for cs.LG. https://arxiv.org/list/cs.LG/new
[5/8]:
- Spatiotemporal Forecasting as Planning: A Model-Based Reinforcement Learning Approach with Genera...
Wu, Gao, Shi, Li, Xu, Zhang, Zhu, Wang, Luo, Wang, Wu, Huang
Replaced article(s) found for cs.LG. https://arxiv.org/list/cs.LG/new
[3/8]:
- EFRame: Deeper Reasoning via Exploration-Filter-Replay Reinforcement Learning Framework
Chen Wang, Lai Wei, Yanzhi Zhang, Chenyang Shao, Zedong Dan, Weiran Huang, Yuzhi Zhang, Yue Wang
Test-Time Graph Search for Goal-Conditioned Reinforcement Learning
Evgenii Opryshko, Junwei Quan, Claas Voelcker, Yilun Du, Igor Gilitschenski
https://arxiv.org/abs/2510.07257 h…
Multi-Actor Multi-Critic Deep Deterministic Reinforcement Learning with a Novel Q-Ensemble Method
Andy Wu, Chun-Cheng Lin, Rung-Tzuo Liaw, Yuehua Huang, Chihjung Kuo, Chia Tong Weng
https://arxiv.org/abs/2510.01083
Ego-Vision World Model for Humanoid Contact Planning
Hang Liu, Yuman Gao, Sangli Teng, Yufeng Chi, Yakun Sophia Shao, Zhongyu Li, Maani Ghaffari, Koushil Sreenath
https://arxiv.org/abs/2510.11682
To Distill or Decide? Understanding the Algorithmic Trade-off in Partially Observable Reinforcement Learning
Yuda Song, Dhruv Rohatgi, Aarti Singh, J. Andrew Bagnell
https://arxiv.org/abs/2510.03207
Representation-Based Exploration for Language Models: From Test-Time to Post-Training
Jens Tuyls, Dylan J. Foster, Akshay Krishnamurthy, Jordan T. Ash
https://arxiv.org/abs/2510.11686
Stratified GRPO: Handling Structural Heterogeneity in Reinforcement Learning of LLM Search Agents
Mingkang Zhu, Xi Chen, Bei Yu, Hengshuang Zhao, Jiaya Jia
https://arxiv.org/abs/2510.06214
Learning from Failures: Understanding LLM Alignment through Failure-Aware Inverse RL
Nyal Patel, Matthieu Bou, Arjun Jagota, Satyapriya Krishna, Sonali Parbhoo
https://arxiv.org/abs/2510.06092
MATH-Beyond: A Benchmark for RL to Expand Beyond the Base Model
Prasanna Mayilvahanan, Ricardo Dominguez-Olmedo, Thadd\"aus Wiedemer, Wieland Brendel
https://arxiv.org/abs/2510.11653
ReLook: Vision-Grounded RL with a Multimodal LLM Critic for Agentic Web Coding
Yuhang Li, Chenchen Zhang, Ruilin Lv, Ao Liu, Ken Deng, Yuanxing Zhang, Jiaheng Liu, Wiggin Zhou, Bo Zhou
https://arxiv.org/abs/2510.11498
xRouter: Training Cost-Aware LLMs Orchestration System via Reinforcement Learning
Cheng Qian, Zuxin Liu, Shirley Kokane, Akshara Prabhakar, Jielin Qiu, Haolin Chen, Zhiwei Liu, Heng Ji, Weiran Yao, Shelby Heinecke, Silvio Savarese, Caiming Xiong, Huan Wang
https://arxiv.org/abs/2510.08439
A Unified Deep Reinforcement Learning Approach for Close Enough Traveling Salesman Problem
Mingfeng Fan, Jiaqi Cheng, Yaoxin Wu, Yifeng Zhang, Yibin Yang, Guohua Wu, Guillaume Sartoretti
https://arxiv.org/abs/2510.03065
Low-probability Tokens Sustain Exploration in Reinforcement Learning with Verifiable Reward
Guanhua Huang, Tingqiang Xu, Mingze Wang, Qi Yi, Xue Gong, Siheng Li, Ruibin Xiong, Kejiao Li, Yuhao Jiang, Bo Zhou
https://arxiv.org/abs/2510.03222
h1: Bootstrapping LLMs to Reason over Longer Horizons via Reinforcement Learning
Sumeet Ramesh Motwani, Alesia Ivanova, Ziyang Cai, Philip Torr, Riashat Islam, Shital Shah, Christian Schroeder de Witt, Charles London
https://arxiv.org/abs/2510.07312
HINT: Helping Ineffective Rollouts Navigate Towards Effectiveness
Xinyi Wang, Jinyi Han, Zishang Jiang, Tingyun Li, Jiaqing Liang, Sihang Jiang, Zhaoqian Dai, Shuguang Ma, Fei Yu, Yanghua Xiao
https://arxiv.org/abs/2510.09388