
2025-09-22 07:30:51
The Distribution Shift Problem in Transportation Networks using Reinforcement Learning and AI
Federico Taschin, Abderrahmane Lazaraq, Ozan K. Tonguz, Inci Ozgunes
https://arxiv.org/abs/2509.15291
The Distribution Shift Problem in Transportation Networks using Reinforcement Learning and AI
Federico Taschin, Abderrahmane Lazaraq, Ozan K. Tonguz, Inci Ozgunes
https://arxiv.org/abs/2509.15291
Uncertainty-Based Smooth Policy Regularisation for Reinforcement Learning with Few Demonstrations
Yujie Zhu, Charles A. Hepburn, Matthew Thorpe, Giovanni Montana
https://arxiv.org/abs/2509.15981
PRIMT: Preference-based Reinforcement Learning with Multimodal Feedback and Trajectory Synthesis from Foundation Models
Ruiqi Wang, Dezhong Zhao, Ziqin Yuan, Tianyu Shao, Guohua Chen, Dominic Kao, Sungeun Hong, Byung-Cheol Min
https://arxiv.org/abs/2509.15607
Quantum Reinforcement Learning with Dynamic-Circuit Qubit Reuse and Grover-Based Trajectory Optimization
Thet Htar Su, Shaswot Shresthamali, Masaaki Kondo
https://arxiv.org/abs/2509.16002
Deep Reinforcement Learning Based Routing for Heterogeneous Multi-Hop Wireless Networks
Brian Kim, Justin H. Kong, Terrence J. Moore, Fikadu T. Dagefu
https://arxiv.org/abs/2508.14884
Hierarchical Reinforcement Learning with Low-Level MPC for Multi-Agent Control
Max Studt, Georg Schildbach
https://arxiv.org/abs/2509.15799 https://arxiv.o…
Demystifying Reward Design in Reinforcement Learning for Upper Extremity Interaction: Practical Guidelines for Biomechanical Simulations in HCI
Hannah Selder, Florian Fischer, Per Ola Kristensson, Arthur Fleig
https://arxiv.org/abs/2508.15727
EMO-RL: Emotion-Rule-Based Reinforcement Learning Enhanced Audio-Language Model for Generalized Speech Emotion Recognition
Pengcheng Li, Botao Zhao, Zuheng Kang, Junqing Peng, Xiaoyang Qu, Yayun He, Jianzong Wang
https://arxiv.org/abs/2509.15654
Energy-Efficient Routing Algorithm for Wireless Sensor Networks: A Multi-Agent Reinforcement Learning Approach
Parham Soltani, Mehrshad Eskandarpour, Amir Ahmadizad, Hossein Soleimani
https://arxiv.org/abs/2508.14679
Search-Based Credit Assignment for Offline Preference-Based Reinforcement Learning
Xiancheng Gao, Yufeng Shi, Wengang Zhou, Houqiang Li
https://arxiv.org/abs/2508.15327 https://…
RLinf: Flexible and Efficient Large-scale Reinforcement Learning via Macro-to-Micro Flow Transformation
Chao Yu, Yuanqing Wang, Zhen Guo, Hao Lin, Si Xu, Hongzhi Zang, Quanlu Zhang, Yongji Wu, Chunyang Zhu, Junhao Hu, Zixiao Huang, Mingjie Wei, Yuqing Xie, Ke Yang, Bo Dai, Zhexuan Xu, Xiangyuan Wang, Xu Fu, Zhihao Liu, Kang Chen, Weilin Liu, Gang Liu, Boxun Li, Jianlei Yang, Zhi Yang, Guohao Dai, Yu Wang
Reinforcement learning entangling operations on spin qubits
Mohammad Abedi, Markus Schmitt
https://arxiv.org/abs/2508.14761 https://arxiv.org/pdf/2508.1476…
SimGenHOI: Physically Realistic Whole-Body Humanoid-Object Interaction via Generative Modeling and Reinforcement Learning
Yuhang Lin, Yijia Xie, Jiahong Xie, Yuehao Huang, Ruoyu Wang, Jiajun Lv, Yukai Ma, Xingxing Zuo
https://arxiv.org/abs/2508.14120
Hybrid Learning and Optimization methods for solving Capacitated Vehicle Routing Problem
Monit Sharma, Hoong Chuin Lau
https://arxiv.org/abs/2509.15262 https://
Automated Cyber Defense with Generalizable Graph-based Reinforcement Learning Agents
Isaiah J. King, Benjamin Bowman, H. Howie Huang
https://arxiv.org/abs/2509.16151 https://
A Dynamical Systems Framework for Reinforcement Learning Safety and Robustness Verification
Ahmed Nasir, Abdelhafid Zenati
https://arxiv.org/abs/2508.15588 https://
ProMed: Shapley Information Gain Guided Reinforcement Learning for Proactive Medical LLMs
Hongxin Ding, Baixiang Huang, Yue Fang, Weibin Liao, Xinke Jiang, Zheng Li, Junfeng Zhao, Yasha Wang
https://arxiv.org/abs/2508.13514
Thinking in cocktail party: Chain-of-Thought and reinforcement learning for target speaker automatic speech recognition
Yiru Zhang, Hang Su, Lichun Fan, Zhenbo Luo, Jian Luan
https://arxiv.org/abs/2509.15612
Replaced article(s) found for cs.CV. https://arxiv.org/list/cs.CV/new
[3/4]:
- cadrille: Multi-modal CAD Reconstruction with Online Reinforcement Learning
Kolodiazhnyi, Tarasov, Zhemchuzhnikov, Nikulin, Zisman, Vorontsova, Konushin, Kurenkov, Rukhovich
Distributed Detection of Adversarial Attacks in Multi-Agent Reinforcement Learning with Continuous Action Space
Kiarash Kazari, Ezzeldin Shereen, Gy\"orgy D\'an
https://arxiv.org/abs/2508.15764
Understanding Action Effects through Instrumental Empowerment in Multi-Agent Reinforcement Learning
Ardian Selmonaj, Miroslav Strupl, Oleg Szehr, Alessandro Antonucci
https://arxiv.org/abs/2508.15652
On-Policy Reinforcement-Learning Control for Optimal Energy Sharing and Temperature Regulation in District Heating Systems
Xinyi Yi, Ioannis Lestas
https://arxiv.org/abs/2509.16083
No More Marching: Learning Humanoid Locomotion for Short-Range SE(2) Targets
Pranay Dugar, Mohitvishnu S. Gadde, Jonah Siekmann, Yesh Godse, Aayam Shrestha, Alan Fern
https://arxiv.org/abs/2508.14098
Replaced article(s) found for cs.IR. https://arxiv.org/list/cs.IR/new
[1/1]:
- Reinforcement Learning to Rank Using Coarse-grained Rewards
Yiteng Tu, Zhichao Xu, Tao Yang, Weihang Su, Yujia Zhou, Yiqun Liu, Fen Lin, Qin Liu, Qingyao Ai
LEED: A Highly Efficient and Scalable LLM-Empowered Expert Demonstrations Framework for Multi-Agent Reinforcement Learning
Tianyang Duan, Zongyuan Zhang, Songxiao Guo, Dong Huang, Yuanye Zhao, Zheng Lin, Zihan Fang, Dianxin Luan, Heming Cui, Yong Cui
https://arxiv.org/abs/2509.14680
Adaptive Vision-Based Coverage Optimization in Mobile Wireless Sensor Networks: A Multi-Agent Deep Reinforcement Learning Approach
Parham Soltani, Mehrshad Eskandarpour, Sina Heidari, Farnaz Alizadeh, Hossein Soleimani
https://arxiv.org/abs/2508.14676
NiceWebRL: a Python library for human subject experiments with reinforcement learning environments
Wilka Carvalho, Vikram Goddla, Ishaan Sinha, Hoon Shin, Kunal Jha
https://arxiv.org/abs/2508.15693
PepThink-R1: LLM for Interpretable Cyclic Peptide Optimization with CoT SFT and Reinforcement Learning
Ruheng Wang, Hang Zhang, Trieu Nguyen, Shasha Feng, Hao-Wei Pang, Xiang Yu, Li Xiao, Peter Zhiping Zhang
https://arxiv.org/abs/2508.14765
Reinforcement Learning-Based Market Making as a Stochastic Control on Non-Stationary Limit Order Book Dynamics
Rafael Zimmer, Oswaldo Luiz do Valle Costa
https://arxiv.org/abs/2509.12456
Empathy-R1: A Chain-of-Empathy and Reinforcement Learning Framework for Long-Form Mental Health Support
Xianrong Yao, Dong She, Chenxu Zhang, Yimeng Zhang, Yueru Sun, Noman Ahmed, Yang Gao, Zhanpeng Jin
https://arxiv.org/abs/2509.14851
Breaking Reward Collapse: Adaptive Reinforcement for Open-ended Medical Reasoning with Enhanced Semantic Discrimination
Yizhou Liu, Jingwei Wei, Zizhi Chen, Minghao Han, Xukun Zhang, Keliang Liu, Lihua Zhang
https://arxiv.org/abs/2508.12957
DiffusionNFT: Online Diffusion Reinforcement with Forward Process
Kaiwen Zheng, Huayu Chen, Haotian Ye, Haoxiang Wang, Qinsheng Zhang, Kai Jiang, Hang Su, Stefano Ermon, Jun Zhu, Ming-Yu Liu
https://arxiv.org/abs/2509.16117
CCrepairBench: A High-Fidelity Benchmark and Reinforcement Learning Framework for C Compilation Repair
Weixuan Sun, Jucai Zhai, Dengfeng Liu, Xin Zhang, Xiaojun Wu, Qiaobo Hao, AIMgroup, Yang Fang, Jiuyang Tang
https://arxiv.org/abs/2509.15690
End-to-End Agentic RAG System Training for Traceable Diagnostic Reasoning
Qiaoyu Zheng, Yuze Sun, Chaoyi Wu, Weike Zhao, Pengcheng Qiu, Yongguo Yu, Kun Sun, Yanfeng Wang, Ya Zhang, Weidi Xie
https://arxiv.org/abs/2508.15746
AI Methods for Permutation Circuit Synthesis Across Generic Topologies
Victor Villar, Juan Cruz-Benito, Ismael Faro, David Kremer
https://arxiv.org/abs/2509.16020 https://
Foundation Models as World Models: A Foundational Study in Text-Based GridWorlds
Remo Sasso, Michelangelo Conserva, Dominik Jeurissen, Paulo Rauber
https://arxiv.org/abs/2509.15915
Constructive Conflict-Driven Multi-Agent Reinforcement Learning for Strategic Diversity
Yuxiang Mai, Qiyue Yin, Wancheng Ni, Pei Xu, Kaiqi Huang
https://arxiv.org/abs/2509.14276
Reinforcement Learning-based Adaptive Path Selection for Programmable Networks
Jos\'e Eduardo Zerna Torres, Marios Avgeris, Chrysa Papagianni, Gergely Pongr\'acz, Istv\'an G\'odor, Paola Grosso
https://arxiv.org/abs/2508.13806
Scalable Multi-Objective Robot Reinforcement Learning through Gradient Conflict Resolution
Humphrey Munn, Brendan Tidd, Peter B\"ohm, Marcus Gallagher, David Howard
https://arxiv.org/abs/2509.14816
Breaking the SFT Plateau: Multimodal Structured Reinforcement Learning for Chart-to-Code Generation
Lei Chen, Xuanle Zhao, Zhixiong Zeng, Jing Huang, Liming Zheng, Yufeng Zhong, Lin Ma
https://arxiv.org/abs/2508.13587
Perception Before Reasoning: Two-Stage Reinforcement Learning for Visual Reasoning in Vision-Language Models
Yan Chen, Long Li, Teng Xi, Long Zeng, Jingdong Wang
https://arxiv.org/abs/2509.13031
Digital Twin-based Cooperative Autonomous Driving in Smart Intersections: A Multi-Agent Reinforcement Learning Approach
Taoyuan Yu, Kui Wang, Zongdian Li, Tao Yu, Kei Sakaguchi, Walid Saad
https://arxiv.org/abs/2509.15099
REACH: Reinforcement Learning for Efficient Allocation in Community and Heterogeneous Networks
Zhiwei Yu, Chengze Du, Heng Xu, Ying Zhou, Bo Liu, Jialong Li
https://arxiv.org/abs/2508.12857
Convergent Reinforcement Learning Algorithms for Stochastic Shortest Path Problem
Soumyajit Guin, Shalabh Bhatnagar
https://arxiv.org/abs/2508.13963 https://
Centralized Permutation Equivariant Policy for Cooperative Multi-Agent Reinforcement Learning
Zhuofan Xu, Benedikt Bollig, Matthias F\"ugger, Thomas Nowak, Vincent Le Dr\'eau
https://arxiv.org/abs/2508.11706
Robot Trains Robot: Automatic Real-World Policy Adaptation and Learning for Humanoids
Kaizhe Hu, Haochen Shi, Yao He, Weizhuo Wang, C. Karen Liu, Shuran Song
https://arxiv.org/abs/2508.12252
Quantum Reinforcement Learning-Guided Diffusion Model for Image Synthesis via Hybrid Quantum-Classical Generative Model Architectures
Chi-Sheng Chen, En-Jui Kuo
https://arxiv.org/abs/2509.14163
Wisdom of the Crowd: Reinforcement Learning from Coevolutionary Collective Feedback
Wenzhen Yuan, Shengji Tang, Weihao Lin, Jiacheng Ruan, Ganqu Cui, Bo Zhang, Tao Chen, Ting Liu, Yuzhuo Fu, Peng Ye, Lei Bai
https://arxiv.org/abs/2508.12338
Learning from Preferences and Mixed Demonstrations in General Settings
Jason R Brown, Carl Henrik Ek, Robert D Mullins
https://arxiv.org/abs/2508.14027 https://
Toward Better EHR Reasoning in LLMs: Reinforcement Learning with Expert Attention Guidance
Yue Fang, Yuxin Guo, Jiaran Gao, Hongxin Ding, Xinke Jiang, Weibin Liao, Yongxin Xu, Yinghao Zhu, Zhibang Yang, Liantao Ma, Junfeng Zhao, Yasha Wang
https://arxiv.org/abs/2508.13579
MACTAS: Self-Attention-Based Module for Inter-Agent Communication in Multi-Agent Reinforcement Learning
Maciej Wojtala, Bogusz Stefa\'nczyk, Dominik Bogucki, {\L}ukasz Lepak, Jakub Strykowski, Pawe{\l} Wawrzy\'nski
https://arxiv.org/abs/2508.13661
Efficient Environment Design for Multi-Robot Navigation via Continuous Control
Jahid Chowdhury Choton, John Woods, William Hsu
https://arxiv.org/abs/2508.14105 https://
Near-Real-Time Resource Slicing for QoS Optimization in 5G O-RAN using Deep Reinforcement Learning
Peihao Yan, Jie Lu, Huacheng Zeng, Y. Thomas Hou
https://arxiv.org/abs/2509.14343
Vulnerable Agent Identification in Large-Scale Multi-Agent Reinforcement Learning
Simin Li, Zheng Yuwei, Zihao Mao, Linhao Wang, Ruixiao Xu, Chengdong Ma, Xin Yu, Yuqing Ma, Qi Dou, Xin Wang, Jie Luo, Bo An, Yaodong Yang, Weifeng Lv, Xianglong Liu
https://arxiv.org/abs/2509.15103
CoIRL-AD: Collaborative-Competitive Imitation-Reinforcement Learning in Latent World Models for Autonomous Driving
Xiaoji Zheng, Ziyuan Yang, Yanhao Chen, Yuhang Peng, Yuanrong Tang, Gengyuan Liu, Bokui Chen, Jiangtao Gong
https://arxiv.org/abs/2510.12560
Momentum-constrained Hybrid Heuristic Trajectory Optimization Framework with Residual-enhanced DRL for Visually Impaired Scenarios
Yuting Zeng, Zhiwen Zheng, You Zhou, JiaLing Xiao, Yongbin Yu, Manping Fan, Bo Gong, Liyong Ren
https://arxiv.org/abs/2509.15582
Reinforcement Learning with Rubric Anchors
Zenan Huang, Yihong Zhuang, Guoshan Lu, Zeyu Qin, Haokai Xu, Tianyu Zhao, Ru Peng, Jiaqi Hu, Zhanming Shen, Xiaomeng Hu, Xijun Gu, Peiyi Tu, Jiaxin Liu, Wenyu Chen, Yuzhuo Fu, Zhiting Fan, Yanmei Gu, Yuanyuan Wang, Zhengkai Yang, Jianguo Li, Junbo Zhao
https://arxiv.org/abs/2508.12790
Categorical Policies: Multimodal Policy Learning and Exploration in Continuous Control
SM Mazharul Islam, Manfred Huber
https://arxiv.org/abs/2508.13922 https://
Revisiting Diffusion Q-Learning: From Iterative Denoising to One-Step Action Generation
Thanh Nguyen, Chang D. Yoo
https://arxiv.org/abs/2508.13904 https://
RLNVR: Reinforcement Learning from Non-Verified Real-World Rewards
Rohit Krishnan, Jon Evans
https://arxiv.org/abs/2508.12165 https://arxiv.org/pdf/2508.12…
GenFlowRL: Shaping Rewards with Generative Object-Centric Flow in Visual Reinforcement Learning
Kelin Yu, Sheng Zhang, Harshit Soora, Furong Huang, Heng Huang, Pratap Tokekar, Ruohan Gao
https://arxiv.org/abs/2508.11049
ComputerRL: Scaling End-to-End Online Reinforcement Learning for Computer Use Agents
Hanyu Lai, Xiao Liu, Yanxiao Zhao, Han Xu, Hanchen Zhang, Bohao Jing, Yanyu Ren, Shuntian Yao, Yuxiao Dong, Jie Tang
https://arxiv.org/abs/2508.14040
SIGN: Safety-Aware Image-Goal Navigation for Autonomous Drones via Reinforcement Learning
Zichen Yan, Rui Huang, Lei He, Shao Guo, Lin Zhao
https://arxiv.org/abs/2508.12394 http…
OPTIC-ER: A Reinforcement Learning Framework for Real-Time Emergency Response and Equitable Resource Allocation in Underserved African Communities
Mary Tonwe
https://arxiv.org/abs/2508.12943
$Agent^2$: An Agent-Generates-Agent Framework for Reinforcement Learning Automation
Yuan Wei, Xiaohan Shan, Ran Miao, Jianmin Li
https://arxiv.org/abs/2509.13368 https://…
TGPO: Tree-Guided Preference Optimization for Robust Web Agent Reinforcement Learning
Ziyuan Chen, Zhenghui Zhao, Zhangye Han, Miancan Liu, Xianhang Ye, Yiqing Li, Hongbo Min, Jinkui Ren, Xiantao Zhang, Guitao Cao
https://arxiv.org/abs/2509.14172
Reinforcement Learning for Autonomous Point-to-Point UAV Navigation
Salim Oyinlola, Nitesh Subedi, Soumik Sarkar
https://arxiv.org/abs/2509.13943 https://a…
Universal and Transferable Adversarial Attack on Large Language Models Using Exponentiated Gradient Descent
Sajib Biswas, Mao Nishino, Samuel Jacob Chacko, Xiuwen Liu
https://arxiv.org/abs/2508.14853
Actor-Critic for Continuous Action Chunks: A Reinforcement Learning Framework for Long-Horizon Robotic Manipulation with Sparse Reward
Jiarui Yang, Bin Zhu, Jingjing Chen, Yu-Gang Jiang
https://arxiv.org/abs/2508.11143
Improving Monte Carlo Tree Search for Symbolic Regression
Zhengyao Huang, Daniel Zhengyu Huang, Tiannan Xiao, Dina Ma, Zhenyu Ming, Hao Shi, Yuanhui Wen
https://arxiv.org/abs/2509.15929
RationAnomaly: Log Anomaly Detection with Rationality via Chain-of-Thought and Reinforcement Learning
Song Xu, Yilun Liu, Minggui He, Mingchen Dai, Ziang Chen, Chunguang Zhao, Jingzhou Du, Shimin Tao, Weibin Meng, Shenglin Zhang, Yongqian Sun, Boxing Chen, Daimeng Wei
https://arxiv.org/abs/2509.14693…
Few-shot Vision-based Human Activity Recognition with MLLM-based Visual Reinforcement Learning
Wenqi Zheng, Yutaka Arakawa
https://arxiv.org/abs/2508.10371 https://
Compute-Optimal Scaling for Value-Based Deep RL
Preston Fu, Oleh Rybkin, Zhiyuan Zhou, Michal Nauman, Pieter Abbeel, Sergey Levine, Aviral Kumar
https://arxiv.org/abs/2508.14881
Beyond ReLU: Chebyshev-DQN for Enhanced Deep Q-Networks
Saman Yazdannik, Morteza Tayefi, Shamim Sanisales
https://arxiv.org/abs/2508.14536 https://arxiv.or…
The Yokai Learning Environment: Tracking Beliefs Over Space and Time
Constantin Ruhdorfer, Matteo Bortoletto, Andreas Bulling
https://arxiv.org/abs/2508.12480 https://
SEG-Parking: Towards Safe, Efficient, and Generalizable Autonomous Parking via End-to-End Offline Reinforcement Learning
Zewei Yang, Zengqi Peng, Jun Ma
https://arxiv.org/abs/2509.13956
Enhancing Generative Auto-bidding with Offline Reward Evaluation and Policy Search
Zhiyu Mou, Yiqin Lv, Miao Xu, Cheems Wang, Yixiu Mao, Qichen Ye, Chao Li, Rongquan Bai, Chuan Yu, Jian Xu, Bo Zheng
https://arxiv.org/abs/2509.15927
Manipulate-to-Navigate: Reinforcement Learning with Visual Affordances and Manipulability Priors
Yuying Zhang, Joni Pajarinen
https://arxiv.org/abs/2508.13151 https://
HERAKLES: Hierarchical Skill Compilation for Open-ended LLM Agents
Thomas Carta, Cl\'ement Romac, Loris Gaven, Pierre-Yves Oudeyer, Olivier Sigaud, Sylvain Lamprier
https://arxiv.org/abs/2508.14751
AFABench: A Generic Framework for Benchmarking Active Feature Acquisition
Valter Sch\"utz, Han Wu, Reza Rezvan, Linus Aronsson, Morteza Haghir Chehreghani
https://arxiv.org/abs/2508.14734
Towards Open-Ended Emotional Support Conversations in LLMs via Reinforcement Learning with Future-Oriented Rewards
Ting Yang, Li Chen, Huimin Wang
https://arxiv.org/abs/2508.12935
Quantum deep reinforcement learning for humanoid robot navigation task
Romerik Lokossou, Birhanu Shimelis Girma, Ozan K. Tonguz, Ahmed Biyabani
https://arxiv.org/abs/2509.11388 …
A Universal Banach--Bregman Framework for Stochastic Iterations: Unifying Stochastic Mirror Descent, Learning and LLM Training
Johnny R. Zhang (Independent Researcher), Xiaomei Mi (University of Manchester), Gaoyuan Du (Amazon), Qianyi Sun (Microsoft), Shiqi Wang (Meta), Jiaxuan Li (Amazon), Wenhua Zhou (Independent Researcher)
https://arx…
Multi-Group Equivariant Augmentation for Reinforcement Learning in Robot Manipulation
Hongbin Lin, Juan Rojas, Kwok Wai Samuel Au
https://arxiv.org/abs/2508.11204 https://
ERA: Transforming VLMs into Embodied Agents via Embodied Prior Learning and Online Reinforcement Learning
Hanyang Chen, Mark Zhao, Rui Yang, Qinwei Ma, Ke Yang, Jiarui Yao, Kangrui Wang, Hao Bai, Zhenhailong Wang, Rui Pan, Mengchao Zhang, Jose Barreiros, Aykut Onol, ChengXiang Zhai, Heng Ji, Manling Li, Huan Zhang, Tong Zhang
https://arxiv…
Online Bayesian Risk-Averse Reinforcement Learning
Yuhao Wang, Enlu Zhou
https://arxiv.org/abs/2509.14077 https://arxiv.org/pdf/2509.14077
Fusing Rewards and Preferences in Reinforcement Learning
Sadegh Khorasani, Saber Salehkaleybar, Negar Kiyavash, Matthias Grossglauser
https://arxiv.org/abs/2508.11363 https://…
SHaRe-RL: Structured, Interactive Reinforcement Learning for Contact-Rich Industrial Assembly Tasks
Jannick Strangh\"oner, Philipp Hartmann, Marco Braun, Sebastian Wrede, Klaus Neumann
https://arxiv.org/abs/2509.13949
On-Policy RL Meets Off-Policy Experts: Harmonizing Supervised Fine-Tuning and Reinforcement Learning via Dynamic Weighting
Wenhao Zhang, Yuexiang Xie, Yuchang Sun, Yanxi Chen, Guoyin Wang, Yaliang Li, Bolin Ding, Jingren Zhou
https://arxiv.org/abs/2508.11408
GRATE: a Graph transformer-based deep Reinforcement learning Approach for Time-efficient autonomous robot Exploration
Haozhan Ni, Jingsong Liang, Chenyu He, Yuhong Cao, Guillaume Sartoretti
https://arxiv.org/abs/2509.12863
ETTRL: Balancing Exploration and Exploitation in LLM Test-Time Reinforcement Learning Via Entropy Mechanism
Jia Liu, ChangYi He, YingQiao Lin, MingMin Yang, FeiYang Shen, ShaoGuo Liu, TingTing Gao
https://arxiv.org/abs/2508.11356
Integrating Trajectory Optimization and Reinforcement Learning for Quadrupedal Jumping with Terrain-Adaptive Landing
Renjie Wang, Shangke Lyu, Xin Lang, Wei Xiao, Donglin Wang
https://arxiv.org/abs/2509.12776
How Reinforcement Learning After Next-Token Prediction Facilitates Learning
Nikolaos Tsilivis, Eran Malach, Karen Ullrich, Julia Kempe
https://arxiv.org/abs/2510.11495 https://
Variance Reduced Policy Gradient Method for Multi-Objective Reinforcement Learning
Davide Guidobene, Lorenzo Benedetti, Diego Arapovic
https://arxiv.org/abs/2508.10608 https://
GAR: Generative Adversarial Reinforcement Learning for Formal Theorem Proving
Ruida Wang, Jiarui Yao, Rui Pan, Shizhe Diao, Tong Zhang
https://arxiv.org/abs/2510.11769 https://
Stackelberg Coupling of Online Representation Learning and Reinforcement Learning
Fernando Martinez, Tao Li, Yingdong Lu, Juntao Chen
https://arxiv.org/abs/2508.07452 https://…
Quantum Machine Learning, Quantitative Trading, Reinforcement Learning, Deep Learning
Jun-Hao Chen, Yu-Chien Huang, Yun-Cheng Tsai, Samuel Yen-Chi Chen
https://arxiv.org/abs/2509.09176
$K$-Level Policy Gradients for Multi-Agent Reinforcement Learning
Aryaman Reddi, Gabriele Tiboni, Jan Peters, Carlo D'Eramo
https://arxiv.org/abs/2509.12117 https://
Generalizing Behavior via Inverse Reinforcement Learning with Closed-Form Reward Centroids
Filippo Lazzati, Alberto Maria Metelli
https://arxiv.org/abs/2509.12010 https://
Depth-Breadth Synergy in RLVR: Unlocking LLM Reasoning Gains with Adaptive Exploration
Zhicheng Yang, Zhijiang Guo, Yinya Huang, Yongxin Wang, Dongchun Xie, Yiwei Wang, Xiaodan Liang, Jing Tang
https://arxiv.org/abs/2508.13755