
2025-06-19 08:34:23
Booster Gym: An End-to-End Reinforcement Learning Framework for Humanoid Robot Locomotion
Yushi Wang, Penghui Chen, Xinyu Han, Feng Wu, Mingguo Zhao
https://arxiv.org/abs/2506.15132
Booster Gym: An End-to-End Reinforcement Learning Framework for Humanoid Robot Locomotion
Yushi Wang, Penghui Chen, Xinyu Han, Feng Wu, Mingguo Zhao
https://arxiv.org/abs/2506.15132
Can you see how I learn? Human observers' inferences about Reinforcement Learning agents' learning processes
Bernhard Hilpert, Muhan Hou, Kim Baraka, Joost Broekens
https://arxiv.org/abs/2506.13583
Make Your AUV Adaptive: An Environment-Aware Reinforcement Learning Framework For Underwater Tasks
Yimian Ding, Jingzehua Xu, Guanwen Xie, Shuai Zhang, Yi Li
https://arxiv.org/abs/2506.15082
Ring-lite: Scalable Reasoning via C3PO-Stabilized Reinforcement Learning for LLMs
Ring Team, Bin Hu, Cai Chen, Deng Zhao, Ding Liu, Dingnan Jin, Feng Zhu, Hao Dai, Hongzhi Luan, Jia Guo, Jiaming Liu, Jiewei Wu, Jun Mei, Jun Zhou, Junbo Zhao, Junwu Xiong, Kaihong Zhang, Kuan Xu, Lei Liang, Liang Jiang, Liangcheng Fu, Longfei Zheng, Qiang Gao, Qing Cui, Quan Wan, Shaomian Zheng, Shuaicheng Li, Tongkai Yang, Wang Ren, Xiaodong Yan, Xiaopei Wan, Xiaoyun Feng, Xin Zhao, Xinxing Yang, Xinyu …
Research on Optimal Control Problem Based on Reinforcement Learning under Knightian Uncertainty
Ziyu Li, Chen Fei, Weiyin Fei
https://arxiv.org/abs/2506.13207
MOORL: A Framework for Integrating Offline-Online Reinforcement Learning
Gaurav Chaudhary, Wassim Uddin Mondal, Laxmidhar Behera
https://arxiv.org/abs/2506.09574
Dynamic Preference Multi-Objective Reinforcement Learning for Internet Network Management
DongNyeong Heo, Daniela Noemi Rim, Heeyoul Choi
https://arxiv.org/abs/2506.13153
Reconstruction-free magnetic control of DIII-D plasma with deep reinforcement learning
G. F. Subbotin, D. I. Sorokin, M. R. Nurgaliev, A. A. Granovskiy, I. P. Kharitonov, E. V. Adishchev, E. N. Khairutdinov, R. Clark, H. Shen, W. Choi, J. Barr, D. M. Orlov
https://arxiv.org/abs/2506.13267
ReVeal: Self-Evolving Code Agents via Iterative Generation-Verification
Yiyang Jin, Kunzhao Xu, Hang Li, Xueting Han, Yanmin Zhou, Cheng Li, Jing Bai
https://arxiv.org/abs/2506.11442
Quadrotor Morpho-Transition: Learning vs Model-Based Control Strategies
Ioannis Mandralis, Richard M. Murray, Morteza Gharib
https://arxiv.org/abs/2506.14039
QForce-RL: Quantized FPGA-Optimized Reinforcement Learning Compute Engine
Anushka Jha, Tanushree Dewangan, Mukul Lokhande, Santosh Kumar Vishvakarma
https://arxiv.org/abs/2506.07046
RL-Guided MPC for Autonomous Greenhouse Control
Salim Msaad, Murray Harraway, Robert D. McAllister
https://arxiv.org/abs/2506.13278 https://
Reasoning with Exploration: An Entropy Perspective
Daixuan Cheng, Shaohan Huang, Xuekai Zhu, Bo Dai, Wayne Xin Zhao, Zhenliang Zhang, Furu Wei
https://arxiv.org/abs/2506.14758
Regret-Optimal Q-Learning with Low Cost for Single-Agent and Federated Reinforcement Learning
Haochen Zhang, Zhong Zheng, Lingzhou Xue
https://arxiv.org/abs/2506.04626
Attention on flow control: transformer-based reinforcement learning for lift regulation in highly disturbed flows
Zhecheng Liu, Jeff D. Eldredge
https://arxiv.org/abs/2506.10153
Automated Treatment Planning for Interstitial HDR Brachytherapy for Locally Advanced Cervical Cancer using Deep Reinforcement Learning
Mohammadamin Moradi, Runyu Jiang, Yingzi Liu, Malvern Madondo, Tianming Wu, James J. Sohn, Xiaofeng Yang, Yasmin Hasan, Zhen Tian
https://arxiv.org/abs/2506.11957…
BASIL: Best-Action Symbolic Interpretable Learning for Evolving Compact RL Policies
Kourosh Shahnazari, Seyed Moein Ayyoubzadeh, Mohammadali Keshtparvar
https://arxiv.org/abs/2506.00328
Pearl: Automatic Code Optimization Using Deep Reinforcement Learning
Djamel Rassem Lamouri, Iheb Nassim Aouadj, Smail Kourta, Riyadh Baghdadi
https://arxiv.org/abs/2506.01880
Policy-Based Trajectory Clustering in Offline Reinforcement Learning
Hao Hu, Xinqi Wang, Simon Shaolei Du
https://arxiv.org/abs/2506.09202 https://
Multi-Loco: Unifying Multi-Embodiment Legged Locomotion via Reinforcement Learning Augmented Diffusion
Shunpeng Yang, Zhen Fu, Zhefeng Cao, Guo Junde, Patrick Wensing, Wei Zhang, Hua Chen
https://arxiv.org/abs/2506.11470
This https://arxiv.org/abs/2503.02189 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csMA_…
CRScore : Reinforcement Learning with Verifiable Tool and AI Feedback for Code Review
Manav Nitin Kapadnis, Atharva Naik, Carolyn Rose
https://arxiv.org/abs/2506.00296
This https://arxiv.org/abs/2404.17589 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csIR_…
VerIF: Verification Engineering for Reinforcement Learning in Instruction Following
Hao Peng, Yunjia Qi, Xiaozhi Wang, Bin Xu, Lei Hou, Juanzi Li
https://arxiv.org/abs/2506.09942 …
Multi-Task Reward Learning from Human Ratings
Mingkang Wu, Devin White, Evelyn Rose, Vernon Lawhern, Nicholas R Waytowich, Yongcan Cao
https://arxiv.org/abs/2506.09183
Learning optimal treatment strategies for intraoperative hypotension using deep reinforcement learning
Esra Adiyeke, Tianqi Liu, Venkata Sai Dheeraj Naganaboina, Han Li, Tyler J. Loftus, Yuanfang Ren, Benjamin Shickel, Matthew M. Ruppert, Karandeep Singh, Ruogu Fang, Parisa Rashidi, Azra Bihorac, Tezcan Ozrazgat-Baslanti
https://
Joint Modeling for Learning Decision-Making Dynamics in Behavioral Experiments
Yuan Bian, Xingche Guo, Yuanjia Wang
https://arxiv.org/abs/2506.02394 https:…
Proxy Target: Bridging the Gap Between Discrete Spiking Neural Networks and Continuous Control
Zijie Xu, Tong Bu, Zecheng Hao, Jianhao Ding, Zhaofei Yu
https://arxiv.org/abs/2505.24161
Eye, Robot: Learning to Look to Act with a BC-RL Perception-Action Loop
Justin Kerr, Kush Hari, Ethan Weber, Chung Min Kim, Brent Yi, Tyler Bonnen, Ken Goldberg, Angjoo Kanazawa
https://arxiv.org/abs/2506.10968
This https://arxiv.org/abs/2506.04168 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csLG_…
Eliciting Reasoning in Language Models with Cognitive Tools
Brown Ebouky, Andrea Bartezzaghi, Mattia Rigotti
https://arxiv.org/abs/2506.12115 https://
Automatic Treatment Planning using Reinforcement Learning for High-dose-rate Prostate Brachytherapy
Tonghe Wang, Yining Feng, Xiaofeng Yang
https://arxiv.org/abs/2506.09805
This https://arxiv.org/abs/2505.19641 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csAI_…
Learning-based primal-dual optimal control of discrete-time stochastic systems with multiplicative noise
Xiushan Jiang, Weihai Zhang
https://arxiv.org/abs/2506.02613
Improving Long-Range Navigation with Spatially-Enhanced Recurrent Memory via End-to-End Reinforcement Learning
Fan Yang, Per Frivik, David Hoeller, Chen Wang, Cesar Cadena, Marco Hutter
https://arxiv.org/abs/2506.05997
A Reinforcement Learning-Based Telematic Routing Protocol for the Internet of Underwater Things
Mohammadhossein Homaei, Mehran Tarif, Agustin Di Bartolo, Oscar Mogollon Gutierrez, Mar Avila
https://arxiv.org/abs/2506.00133
Agnostic Reinforcement Learning: Foundations and Algorithms
Gene Li
https://arxiv.org/abs/2506.01884 https://arxiv.org/pdf/2506.01884…
This https://arxiv.org/abs/2505.00546 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csLG_…
This https://arxiv.org/abs/2503.07792 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csAI_…
Data-assimilated model-informed reinforcement learning
Defne E. Ozan, Andrea N\'ovoa, Georgios Rigas, Luca Magri
https://arxiv.org/abs/2506.01755 https…
MoRE: Mixture of Residual Experts for Humanoid Lifelike Gaits Learning on Complex Terrains
Dewei Wang, Xinmiao Wang, Xinzhe Liu, Jiyuan Shi, Yingnan Zhao, Chenjia Bai, Xuelong Li
https://arxiv.org/abs/2506.08840
Reasoning-Table: Exploring Reinforcement Learning for Table Reasoning
Fangyu Lei, Jinxiang Meng, Yiming Huang, Tinghong Chen, Yun Zhang, Shizhu He, Jun Zhao, Kang Liu
https://arxiv.org/abs/2506.01710
This https://arxiv.org/abs/2506.03703 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csLG_…
Hierarchical Learning-Enhanced MPC for Safe Crowd Navigation with Heterogeneous Constraints
Huajian Liu, Yixuan Feng, Wei Dong, Kunpeng Fan, Chao Wang, Yongzhuo Gao
https://arxiv.org/abs/2506.09859
This https://arxiv.org/abs/2505.24298 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csLG_…
This https://arxiv.org/abs/2503.04280 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csRO_…
This https://arxiv.org/abs/2505.23703 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csAI_…
This https://arxiv.org/abs/2505.23585 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csLG_…
This https://arxiv.org/abs/2506.04147 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csRO_…
This https://arxiv.org/abs/2506.01755 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_ees…
This https://arxiv.org/abs/2505.11862 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csLG_…
Reinforced Refinement with Self-Aware Expansion for End-to-End Autonomous Driving
Haochen Liu, Tianyu Li, Haohan Yang, Li Chen, Caojun Wang, Ke Guo, Haochen Tian, Hongchen Li, Hongyang Li, Chen Lv
https://arxiv.org/abs/2506.09800
This https://arxiv.org/abs/2506.03038 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csCL_…
Hybrid Learning for Cold-Start-Aware Microservice Scheduling in Dynamic Edge Environments
Jingxi Lu, Wenhao Li, Jianxiong Guo, Xingjian Ding, Zhiqing Tang, Tian Wang, Weijia Jia
https://arxiv.org/abs/2505.22424
This https://arxiv.org/abs/2506.00691 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csLG_…
This https://arxiv.org/abs/2409.17469 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csRO_…
This https://arxiv.org/abs/2506.02841 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_ees…
Learning to Explore: An In-Context Learning Approach for Pure Exploration
Alessio Russo, Ryan Welch, Aldo Pacchiano
https://arxiv.org/abs/2506.01876 https:…
This https://arxiv.org/abs/2504.14870 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csAI_…
This https://arxiv.org/abs/2505.24034 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csLG_…
DriveMind: A Dual-VLM based Reinforcement Learning Framework for Autonomous Driving
Dawood Wasif, Terrence J Moore, Chandan K Reddy, Jin-Hee Cho
https://arxiv.org/abs/2506.00819
SLAC: Simulation-Pretrained Latent Action Space for Whole-Body Real-World RL
Jiaheng Hu, Peter Stone, Roberto Mart\'in-Mart\'in
https://arxiv.org/abs/2506.04147
Ensemble-MIX: Enhancing Sample Efficiency in Multi-Agent RL Using Ensemble Methods
Tom Danino, Nahum Shimkin
https://arxiv.org/abs/2506.02841 https://
Reinforcement Learning with Data Bootstrapping for Dynamic Subgoal Pursuit in Humanoid Robot Navigation
Chengyang Peng, Zhihao Zhang, Shiting Gong, Sankalp Agrawal, Keith A. Redmill, Ayonga Hereid
https://arxiv.org/abs/2506.02206
This https://arxiv.org/abs/2412.05718 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csAI_…
This https://arxiv.org/abs/2505.23527 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csLG_…
This https://arxiv.org/abs/2505.22642 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csRO_…
This https://arxiv.org/abs/2505.23527 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csLG_…
This https://arxiv.org/abs/2411.14622 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csRO_…
This https://arxiv.org/abs/2501.07985 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csRO_…
This https://arxiv.org/abs/2308.13140 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csRO_…
This https://arxiv.org/abs/2505.21119 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csLG_…
This https://arxiv.org/abs/2503.18616 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csRO_…
This https://arxiv.org/abs/2505.16401 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csLG_…
This https://arxiv.org/abs/2505.20751 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csRO_…
Disturbance-Aware Adaptive Compensation in Hybrid Force-Position Locomotion Policy for Legged Robots
Yang Zhang, Buqing Nie, Zhanxiang Cao, Yangqing Fu, Yue Gao
https://arxiv.org/abs/2506.00472
This https://arxiv.org/abs/2504.18253 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csRO_…
This https://arxiv.org/abs/2505.18780 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csRO_…
AURA: Agentic Upskilling via Reinforced Abstractions
Alvin Zhu, Yusuke Tanaka, Dennis Hong
https://arxiv.org/abs/2506.02507 https://a…
Learned Controllers for Agile Quadrotors in Pursuit-Evasion Games
Alejandro Sanchez Roncero, Olov Andersson, Petter Ogren
https://arxiv.org/abs/2506.02849 …