
2025-09-01 08:36:33
Think in Games: Learning to Reason in Games via Reinforcement Learning with Large Language Models
Yi Liao, Yu Gu, Yuan Sui, Zining Zhu, Yifan Lu, Guohua Tang, Zhongqian Sun, Wei Yang
https://arxiv.org/abs/2508.21365
Think in Games: Learning to Reason in Games via Reinforcement Learning with Large Language Models
Yi Liao, Yu Gu, Yuan Sui, Zining Zhu, Yifan Lu, Guohua Tang, Zhongqian Sun, Wei Yang
https://arxiv.org/abs/2508.21365
Reusable Test Suites for Reinforcement Learning
J{\o}rn Eirik Betten, Quentin Mazouni, Dennis Gross, Pedro Lind, Helge Spieker
https://arxiv.org/abs/2508.21553 https://
Benchmarking Massively Parallelized Multi-Task Reinforcement Learning for Robotics Tasks
Vira Joshi, Zifan Xu, Bo Liu, Peter Stone, Amy Zhang
https://arxiv.org/abs/2507.23172 ht…
Beyond Prediction: Reinforcement Learning as the Defining Leap in Healthcare AI
Dilruk Perera, Gousia Habib, Qianyi Xu, Daniel J. Tan, Kai He, Erik Cambria, Mengling Feng
https://arxiv.org/abs/2508.21101
Improving Aviation Safety Analysis: Automated HFACS Classification Using Reinforcement Learning with Group Relative Policy Optimization
Arash Ahmadi, Sarah Sharif, Yaser Banad
https://arxiv.org/abs/2508.21201
Reinforcement Learning for Optimizing Large Qubit Array based Quantum Sensor Circuits
Laxmisha Ashok Attisara, Sathish Kumar
https://arxiv.org/abs/2508.21253 https://
Offline Reinforcement Learning for Mobility Robustness Optimization
Pegah Alizadeh, Anastasios Giovanidis, Pradeepa Ramachandra, Vasileios Koutsoukis, Osama Arouk
https://arxiv.org/abs/2506.22793
This https://arxiv.org/abs/2308.13135 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_sta…
Breaking the Cold-Start Barrier: Reinforcement Learning with Double and Dueling DQNs
Minda Zhao
https://arxiv.org/abs/2508.21259 https://arxiv.org/pdf/2508…
AutoIndexer: A Reinforcement Learning-Enhanced Index Advisor Towards Scaling Workloads
Taiyi Wang, Eiko Yoneki
https://arxiv.org/abs/2507.23084 https://arx…
Causal-Inspired Multi-Agent Decision-Making via Graph Reinforcement Learning
Jing Wang, Yan Jin, Fei Ding, Chongfeng Wei
https://arxiv.org/abs/2507.23080 https://
DynaMark: A Reinforcement Learning Framework for Dynamic Watermarking in Industrial Machine Tool Controllers
Navid Aftabi, Abhishek Hanchate, Satish Bukkapatnam, Dan Li
https://arxiv.org/abs/2508.21797
Fine-tuning for Data-enabled Predictive Control of Noisy Systems by Reinforcement Learning
Jinbao Wang, Shiliang Zhang, Jun Liu, Xuehui Ma, Haolin Liu
https://arxiv.org/abs/2505.24572
Learning to Generate Unit Test via Adversarial Reinforcement Learning
Dongjun Lee, Changho Hwang, Kimin Lee
https://arxiv.org/abs/2508.21107 https://arxiv.…
Collaborative Multi-Agent Reinforcement Learning Approach for Elastic Cloud Resource Scaling
Bruce Fang, Danyi Gao
https://arxiv.org/abs/2507.00550 https:/…
Control-Optimized Deep Reinforcement Learning for Artificially Intelligent Autonomous Systems
Oren Fivel, Matan Rudman, Kobi Cohen
https://arxiv.org/abs/2507.00268
Beyond expected value: geometric mean optimization for long-term policy performance in reinforcement learning
Xinyi Sheng, Dominik Baumann
https://arxiv.org/abs/2508.21443 https…
Self-correcting Reward Shaping via Language Models for Reinforcement Learning Agents in Games
Ant\'onio Afonso, Iolanda Leite, Alessandro Sestini, Florian Fuchs, Konrad Tollmar, Linus Gissl\'en
https://arxiv.org/abs/2506.23626
Machine Intelligence on the Edge: Interpretable Cardiac Pattern Localisation Using Reinforcement Learning
Haozhe Tian, Qiyu Rao, Nina Moutonnet, Pietro Ferraro, Danilo Mandic
https://arxiv.org/abs/2508.21652
MiCo: Multi-image Contrast for Reinforcement Visual Reasoning
Xi Chen, Mingkang Zhu, Shaoteng Liu, Xiaoyang Wu, Xiaogang Xu, Yu Liu, Xiang Bai, Hengshuang Zhao
https://arxiv.org/abs/2506.22434
Multi-task Offline Reinforcement Learning for Online Advertising in Recommender Systems
Langming Liu, Wanyu Wang, Chi Zhang, Bo Li, Hongzhi Yin, Xuetao Wei, Wenbo Su, Bo Zheng, Xiangyu Zhao
https://arxiv.org/abs/2506.23090
Deep reinforcement learning for efficient exploration of combinatorial structural design spaces
Chloe S. H. Hong, Keith J. Lee, Caitlin T. Mueller
https://arxiv.org/abs/2507.22804
Mechanical Intelligence-Aware Curriculum Reinforcement Learning for Humanoids with Parallel Actuation
Yusuke Tanaka, Alvin Zhu, Quanyou Wang, Dennis Hong
https://arxiv.org/abs/2507.00273
Hierarchical Adversarially-Resilient Multi-Agent Reinforcement Learning for Cyber-Physical Systems Security
Saad Alqithami
https://arxiv.org/abs/2506.22445
Explanations for Unrealizability of Infinite-State Safety Shields
Andoni Rodriguez, Irfansha Shaik, Davide Corsi, Roy Fox, Cesar Sanchez
https://arxiv.org/abs/2507.23603 https:/…
Med-R$^3$: Enhancing Medical Retrieval-Augmented Reasoning of LLMs via Progressive Reinforcement Learning
Keer Lu, Zheng Liang, Youquan Li, Jiejun Tan, Da Pan, Shusen Zhang, Guosheng Dong, Huang Leng
https://arxiv.org/abs/2507.23541
Sources: Applied Compute, a pre-launch reinforcement learning startup founded by three former OpenAI staffers, raised $20M at a $100M valuation led by Benchmark (Alex Konrad/Upstarts Media)
https://www.upstartsmedia.com/p/ex-openai-applied-compute-raises-20m…
R3DM: Enabling Role Discovery and Diversity Through Dynamics Models in Multi-agent Reinforcement Learning
Harsh Goel, Mohammad Omama, Behdad Chalaki, Vaishnav Tadiparthi, Ehsan Moradi Pari, Sandeep Chinchali
https://arxiv.org/abs/2505.24265
A Knowledge Distillation-empowered Adaptive Federated Reinforcement Learning Framework for Multi-Domain IoT Applications Scheduling
Zhiyu Wang, Mohammad Goudarzi, Mingming Gong, Rajkumar Buyya
https://arxiv.org/abs/2508.21328
Crosslisted article(s) found for eess.IV. https://arxiv.org/list/eess.IV/new
[1/1]:
- QuadKAN: KAN-Enhanced Quadruped Motion Control via End-to-End Reinforcement Learning
Allen Wang, Gavin Tao
Scalable Multi-Task Reinforcement Learning for Generalizable Spatial Intelligence in Visuomotor Agents
Shaofei Cai, Zhancun Mu, Haiwen Xia, Bowei Zhang, Anji Liu, Yitao Liang
https://arxiv.org/abs/2507.23698
This https://arxiv.org/abs/2505.19713 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csGR_…
A Reinforcement Learning Framework for Some Singular Stochastic Control Problems
Zongxia Liang, Xiaodong Luo, Xiang Yu
https://arxiv.org/abs/2506.22203 htt…
Reliable Transmission of LTP Using Reinforcement Learning-Based Adaptive FEC
Liang Chen, Yu Song, Kanglian Zhao, Juan A. Fraire, Wenfeng Li
https://arxiv.org/abs/2506.22470
Harnessing the Power of Reinforcement Learning for Adaptive MCMC
Congye Wang, Matthew A. Fisher, Heishiro Kanagawa, Wilson Chen, Chris. J. Oates
https://arxiv.org/abs/2507.00671
Replaced article(s) found for stat.ME. https://arxiv.org/list/stat.ME/new
[2/2]:
- Semiparametric Double Reinforcement Learning with Applications to Long-Term Causal Inference
Lars van der Laan, David Hubbard, Allen Tran, Nathan Kallus, Aur\'elien Bibaut
Secure Tug-of-War (SecTOW): Iterative Defense-Attack Training with Reinforcement Learning for Multimodal Model Security
Muzhi Dai, Shixuan Liu, Zhiyuan Zhao, Junyu Gao, Hao Sun, Xuelong Li
https://arxiv.org/abs/2507.22037
Proxy Target: Bridging the Gap Between Discrete Spiking Neural Networks and Continuous Control
Zijie Xu, Tong Bu, Zecheng Hao, Jianhao Ding, Zhaofei Yu
https://arxiv.org/abs/2505.24161
Igniting Creative Writing in Small Language Models: LLM-as-a-Judge versus Multi-Agent Refined Rewards
Xiaolong Wei, Bo Lu, Xingyu Zhang, Zhejun Zhao, Dongdong Shen, Long Xia, Dawei Yin
https://arxiv.org/abs/2508.21476
Multi-Timescale Hierarchical Reinforcement Learning for Unified Behavior and Control of Autonomous Driving
Guizhe Jin, Zhuoren Li, Bo Leng, Ran Yu, Lu Xiong
https://arxiv.org/abs/2506.23771
Priors Matter: Addressing Misspecification in Bayesian Deep Q-Learning
Pascal R. van der Vaart, Neil Yorke-Smith, Matthijs T. J. Spaan
https://arxiv.org/abs/2508.21488 https://
Safe Deployment of Offline Reinforcement Learning via Input Convex Action Correction
Alex Durkin, Jasper Stolte, Matthew Jones, Raghuraman Pitchumani, Bei Li, Christian Michler, Mehmet Mercang\"oz
https://arxiv.org/abs/2507.22640
Distributed Neural Policy Gradient Algorithm for Global Convergence of Networked Multi-Agent Reinforcement Learning
Pengcheng Dai, Yuanqiu Mo, Wenwu Yu, Wei Ren
https://arxiv.org/abs/2505.24113
Crosslisted article(s) found for cs.CV. https://arxiv.org/list/cs.CV/new
[1/1]:
- QuadKAN: KAN-Enhanced Quadruped Motion Control via End-to-End Reinforcement Learning
Allen Wang, Gavin Tao
Reward Balancing Revisited: Enhancing Offline Reinforcement Learning for Recommender Systems
Wenzheng Shu, Yanxiang Zeng, Yongxiang Tang, Teng Sha, Ning Luo, Yanhua Cheng, Xialong Liu, Fan Zhou, Peng Jiang
https://arxiv.org/abs/2506.22112
Learning to Drift with Individual Wheel Drive: Maneuvering Autonomous Vehicle at the Handling Limits
Yihan Zhou, Yiwen Lu, Bo Yang, Jiayun Li, Yilin Mo
https://arxiv.org/abs/2507.23339
Noise-Resilient Quantum Reinforcement Learning
Jing-Ci Yue, Jun-Hong An
https://arxiv.org/abs/2508.20601 https://arxiv.org/pdf/2508.20601
Laser Scan Path Design for Controlled Microstructure in Additive Manufacturing with Integrated Reduced-Order Phase-Field Modeling and Deep Reinforcement Learning
Augustine Twumasi, Prokash Chandra Roy, Zixun Li, Soumya Shouvik Bhattacharjee, Zhengtao Gan
https://arxiv.org/abs/2506.21815 …
Replaced article(s) found for stat.ML. https://arxiv.org/list/stat.ML/new
[1/2]:
- Statistical Inference of the Value Function for Reinforcement Learning in Infinite Horizon Settings
C. Shi, S. Zhang, W. Lu, R. Song
Single Agent Robust Deep Reinforcement Learning for Bus Fleet Control
Yifan Zhang
https://arxiv.org/abs/2508.20784 https://arxiv.org/pdf/2508.20784
Toward Trusted Onboard AI: Advancing Small Satellite Operations using Reinforcement Learning
Cannon Whitney, Joseph Melville
https://arxiv.org/abs/2507.22198 https://
Replaced article(s) found for cs.LG. https://arxiv.org/list/cs.LG/new
[2/5]:
- Towards Large-Scale In-Context Reinforcement Learning by Meta-Training in Randomized Worlds
Wang, Shao, Zhang, Yu, Liu, Ding, Cao, Kang, Wang
Learning to Assemble the Soma Cube with Legal-Action Masked DQN and Safe ZYZ Regrasp on a Doosan M0609
Jaehong Oh, Seungjun Jung, Sawoong Kim
https://arxiv.org/abs/2508.21272 ht…
Spatial-Temporal Reinforcement Learning for Network Routing with Non-Markovian Traffic
Molly Wang
https://arxiv.org/abs/2507.22174 https://arxiv.org/pdf/25…
Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable Text-to-Image Reinforcement Learning
Yibin Wang, Zhimin Li, Yuhang Zang, Yujie Zhou, Jiazi Bu, Chunyu Wang, Qinglin Lu, Cheng Jin, Jiaqi Wang
https://arxiv.org/abs/2508.20751
Reducing Hallucinations in Summarization via Reinforcement Learning with Entity Hallucination Index
Praveenkumar Katwe, Rakesh Chandra, Balabantaray Kali, Prasad Vittala
https://arxiv.org/abs/2507.22744
Reactive Aerobatic Flight via Reinforcement Learning
Zhichao Han, Xijie Huang, Zhuxiu Xu, Jiarui Zhang, Yuze Wu, Mingyang Wang, Tianyue Wu, Fei Gao
https://arxiv.org/abs/2505.24396
Replaced article(s) found for cs.AI. https://arxiv.org/list/cs.AI/new
[1/4]:
- Policy Expansion for Bridging Offline-to-Online Reinforcement Learning
Haichao Zhang, We Xu, Haonan Yu
From Sufficiency to Reflection: Reinforcement-Guided Thinking Quality in Retrieval-Augmented Reasoning for LLMs
Jie He, Victor Gutierrez Basulto, Jeff Z. Pan
https://arxiv.org/abs/2507.22716
Learning Agile Gate Traversal via Analytical Optimal Policy Gradient
Tianchen Sun, Bingheng Wang, Longbin Tang, Yichao Gao, Lin Zhao
https://arxiv.org/abs/2508.21592 https://
Homogenization of Multi-agent Learning Dynamics in Finite-state Markov Games
Yann Kerzreho (ENS Paris Saclay)
https://arxiv.org/abs/2506.21079 https://
X-Omni: Reinforcement Learning Makes Discrete Autoregressive Image Generative Models Great Again
Zigang Geng, Yibing Wang, Yeyao Ma, Chen Li, Yongming Rao, Shuyang Gu, Zhao Zhong, Qinglin Lu, Han Hu, Xiaosong Zhang, Linus, Di Wang, Jie Jiang
https://arxiv.org/abs/2507.22058
Crosslisted article(s) found for cs.AI. https://arxiv.org/list/cs.AI/new
[1/5]:
- QuadKAN: KAN-Enhanced Quadruped Motion Control via End-to-End Reinforcement Learning
Allen Wang, Gavin Tao
Reinforcement Learning for Optimal Control of Spin Magnetometers
Logan W. Cooke, Stefanie Czischek
https://arxiv.org/abs/2506.21475 https://
Complex Model Transformations by Reinforcement Learning with Uncertain Human Guidance
Kyanna Dagenais, Istvan David
https://arxiv.org/abs/2506.20883 https:…
This https://arxiv.org/abs/2505.20751 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csRO_…
Replaced article(s) found for cs.LG. https://arxiv.org/list/cs.LG/new
[2/4]:
- BiTrajDiff: Bidirectional Trajectory Generation with Diffusion Models for Offline Reinforcement L...
Yunpeng Qing, Shuo Chen, Yixiao Chi, Shunyu Liu, Sixu Lin, Kelu Yao, Changqing Zou
SAFER: Probing Safety in Reward Models with Sparse Autoencoder
Sihang Li, Wei Shi, Ziyuan Xie, Tao Liang, Guojun Ma, Xiang Wang
https://arxiv.org/abs/2507.00665
Replaced article(s) found for cs.AI. https://arxiv.org/list/cs.AI/new
[3/5]:
- Towards Large-Scale In-Context Reinforcement Learning by Meta-Training in Randomized Worlds
Wang, Shao, Zhang, Yu, Liu, Ding, Cao, Kang, Wang
Crosslisted article(s) found for eess.SY. https://arxiv.org/list/eess.SY/new
[1/1]:
- QuadKAN: KAN-Enhanced Quadruped Motion Control via End-to-End Reinforcement Learning
Allen Wang, Gavin Tao
Replaced article(s) found for cs.CV. https://arxiv.org/list/cs.CV/new
[4/5]:
- Learning Only with Images: Visual Reinforcement Learning with Reasoning, Rendering, and Visual Fe...
Chen, Shen, Huang, Zhou, Lin, Cai, Yu, Bu, Shi, Qiao
Post-Training Large Language Models via Reinforcement Learning from Self-Feedback
Carel van Niekerk, Renato Vukovic, Benjamin Matthias Ruppik, Hsien-chin Lin, Milica Ga\v{s}i\'c
https://arxiv.org/abs/2507.21931
Replicating the behaviour of electric vehicle drivers using an agent-based reinforcement learning model
Zixin Feng, Qunshan Zhao, Alison Heppenstall
https://arxiv.org/abs/2507.21341
Spiking Decision Transformers: Local Plasticity, Phase-Coding, and Dendritic Routing for Low-Power Sequence Control
Vishal Pandey, Debasmita Biswas
https://arxiv.org/abs/2508.21505
AutoTIR: Autonomous Tools Integrated Reasoning via Reinforcement Learning
Yifan Wei, Xiaoyan Yu, Yixuan Weng, Tengfei Pan, Angsheng Li, Li Du
https://arxiv.org/abs/2507.21836 ht…
From Seeing to Experiencing: Scaling Navigation Foundation Models with Reinforcement Learning
Honglin He, Yukai Ma, Wayne Wu, Bolei Zhou
https://arxiv.org/abs/2507.22028 https:/…
QoS-aware State-Augmented Learnable Algorithm for Wireless Coexistence Parameter Management
Mohammad Reza Fasihi, Brian L. Mark
https://arxiv.org/abs/2506.22652
AI-SearchPlanner: Modular Agentic Search via Pareto-Optimal Multi-Objective Reinforcement Learning
Lang Mei, Zhihan Yang, Chong Chen
https://arxiv.org/abs/2508.20368 https://
Linearly Decoding Refused Knowledge in Aligned Language Models
Aryan Shrivastava, Ari Holtzman
https://arxiv.org/abs/2507.00239 https://
Experimental investigation of pose informed reinforcement learning for skid-steered visual navigation
Ameya Salvi, Venkat Krovi
https://arxiv.org/abs/2506.21732
Cooperative Sensing Enhanced UAV Path-Following and Obstacle Avoidance with Variable Formation
Changheng Wang, Zhiqing Wei, Wangjun Jiang, Haoyue Jiang, Zhiyong Feng
https://arxiv.org/abs/2508.21316
Graph-R1: Towards Agentic GraphRAG Framework via End-to-end Reinforcement Learning
Haoran Luo, Haihong E, Guanting Chen, Qika Lin, Yikai Guo, Fangzhi Xu, Zemin Kuang, Meina Song, Xiaobao Wu, Yifan Zhu, Luu Anh Tuan
https://arxiv.org/abs/2507.21892
Real-Time Energy Management Strategies for Community Microgrids
Moslem Uddin, Huadong Mo, Daoyi Dong
https://arxiv.org/abs/2506.22931 https://
SWIRL: A Staged Workflow for Interleaved Reinforcement Learning in Mobile GUI Control
Quanfeng Lu, Zhantao Ma, Shuai Zhong, Jin Wang, Dahai Yu, Michael K. Ng, Ping Luo
https://arxiv.org/abs/2508.20018 …
Model Predictive Adversarial Imitation Learning for Planning from Observation
Tyler Han, Yanda Bao, Bhaumik Mehta, Gabriel Guo, Anubhav Vishwakarma, Emily Kang, Sanghun Jung, Rosario Scalise, Jason Zhou, Bryan Xu, Byron Boots
https://arxiv.org/abs/2507.21533
Replaced article(s) found for cs.RO. https://arxiv.org/list/cs.RO/new
[2/2]:
- Merging and Disentangling Views in Visual Reinforcement Learning for Robotic Manipulation
Abdulaziz Almuzairee, Rohan Patil, Dwait Bhatt, Henrik I. Christensen
Double Check My Desired Return: Transformer with Target Alignment for Offline Reinforcement Learning
Yue Pei, Hongming Zhang, Chao Gao, Martin M\"uller, Mengxiao Zhu, Hao Sheng, Haogang Zhu, Liang Lin
https://arxiv.org/abs/2508.16420
GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning
Lakshya A Agrawal, Shangyin Tan, Dilara Soylu, Noah Ziems, Rishi Khare, Krista Opsahl-Ong, Arnav Singhvi, Herumb Shandilya, Michael J Ryan, Meng Jiang, Christopher Potts, Koushik Sen, Alexandros G. Dimakis, Ion Stoica, Dan Klein, Matei Zaharia, Omar Khattab
https://arx…
A Hierarchical Signal Coordination and Control System Using a Hybrid Model-based and Reinforcement Learning Approach
Xianyue Peng, Shenyang Chen, H. Michael Zhang
https://arxiv.org/abs/2508.20102
Libra: Assessing and Improving Reward Model by Learning to Think
Meng Zhou, Bei Li, Jiahao Liu, Xiaowen Shi, Yang Bai, Rongxiang Weng, Jingang Wang, Xunliang Cai
https://arxiv.org/abs/2507.21645
Enhancing Multi-Agent Collaboration with Attention-Based Actor-Critic Policies
Hugo Garrido-Lestache, Jeremy Kedziora
https://arxiv.org/abs/2507.22782 https://
Active Query Selection for Crowd-Based Reinforcement Learning
Jonathan Erskine, Taku Yamagata, Ra\'ul Santos-Rodr\'iguez
https://arxiv.org/abs/2508.19132 https://…
OctoThinker: Mid-training Incentivizes Reinforcement Learning Scaling
Zengzhi Wang, Fan Zhou, Xuefeng Li, Pengfei Liu
https://arxiv.org/abs/2506.20512 http…
DRMD: Deep Reinforcement Learning for Malware Detection under Concept Drift
Shae McFadden, Myles Foley, Mario D'Onghia, Chris Hicks, Vasilios Mavroudis, Nicola Paoletti, Fabio Pierazzi
https://arxiv.org/abs/2508.18839
On the Definition of Intelligence
Kei-Sing Ng
https://arxiv.org/abs/2507.22423 https://arxiv.org/pdf/2507.22423
Task Allocation for Autonomous Machines using Computational Intelligence and Deep Reinforcement Learning
Thanh Thi Nguyen, Quoc Viet Hung Nguyen, Jonathan Kua, Imran Razzak, Dung Nguyen, Saeid Nahavandi
https://arxiv.org/abs/2508.20688
Bipedalism for Quadrupedal Robots: Versatile Loco-Manipulation through Risk-Adaptive Reinforcement Learning
Yuyou Zhang, Radu Corcodel, Ding Zhao
https://arxiv.org/abs/2507.20382
Memory-R1: Enhancing Large Language Model Agents to Manage and Utilize Memories via Reinforcement Learning
Sikuan Yan, Xiufeng Yang, Zuchao Huang, Ercong Nie, Zifeng Ding, Zonggen Li, Xiaowen Ma, Hinrich Sch\"utze, Volker Tresp, Yunpu Ma
https://arxiv.org/abs/2508.19828
Impedance Primitive-augmented Hierarchical Reinforcement Learning for Sequential Tasks
Amin Berjaoui Tahmaz, Ravi Prakash, Jens Kober
https://arxiv.org/abs/2508.19607 https://…
ReCode: Updating Code API Knowledge with Reinforcement Learning
Haoze Wu, Yunzhi Yao, Wenhao Yu, Huajun Chen, Ningyu Zhang
https://arxiv.org/abs/2506.20495