Tootfinder

Opt-in global Mastodon full text search. Join the index!

@arXiv_csAI_bot@mastoxiv.page
2025-09-22 07:30:51

The Distribution Shift Problem in Transportation Networks using Reinforcement Learning and AI
Federico Taschin, Abderrahmane Lazaraq, Ozan K. Tonguz, Inci Ozgunes
arxiv.org/abs/2509.15291

@arXiv_csLG_bot@mastoxiv.page
2025-09-22 10:26:51

Uncertainty-Based Smooth Policy Regularisation for Reinforcement Learning with Few Demonstrations
Yujie Zhu, Charles A. Hepburn, Matthew Thorpe, Giovanni Montana
arxiv.org/abs/2509.15981

@arXiv_csRO_bot@mastoxiv.page
2025-09-22 09:53:01

PRIMT: Preference-based Reinforcement Learning with Multimodal Feedback and Trajectory Synthesis from Foundation Models
Ruiqi Wang, Dezhong Zhao, Ziqin Yuan, Tianyu Shao, Guohua Chen, Dominic Kao, Sungeun Hong, Byung-Cheol Min
arxiv.org/abs/2509.15607

@arXiv_quantph_bot@mastoxiv.page
2025-09-22 10:16:11

Quantum Reinforcement Learning with Dynamic-Circuit Qubit Reuse and Grover-Based Trajectory Optimization
Thet Htar Su, Shaswot Shresthamali, Masaaki Kondo
arxiv.org/abs/2509.16002

@arXiv_eessSP_bot@mastoxiv.page
2025-08-21 08:58:59

Deep Reinforcement Learning Based Routing for Heterogeneous Multi-Hop Wireless Networks
Brian Kim, Justin H. Kong, Terrence J. Moore, Fikadu T. Dagefu
arxiv.org/abs/2508.14884

@arXiv_eessSY_bot@mastoxiv.page
2025-09-22 09:09:41

Hierarchical Reinforcement Learning with Low-Level MPC for Multi-Agent Control
Max Studt, Georg Schildbach
arxiv.org/abs/2509.15799 arxiv.o…

@arXiv_csHC_bot@mastoxiv.page
2025-08-22 09:32:31

Demystifying Reward Design in Reinforcement Learning for Upper Extremity Interaction: Practical Guidelines for Biomechanical Simulations in HCI
Hannah Selder, Florian Fischer, Per Ola Kristensson, Arthur Fleig
arxiv.org/abs/2508.15727

@arXiv_csSD_bot@mastoxiv.page
2025-09-22 09:45:01

EMO-RL: Emotion-Rule-Based Reinforcement Learning Enhanced Audio-Language Model for Generalized Speech Emotion Recognition
Pengcheng Li, Botao Zhao, Zuheng Kang, Junqing Peng, Xiaoyang Qu, Yayun He, Jianzong Wang
arxiv.org/abs/2509.15654

@arXiv_csNI_bot@mastoxiv.page
2025-08-21 09:43:40

Energy-Efficient Routing Algorithm for Wireless Sensor Networks: A Multi-Agent Reinforcement Learning Approach
Parham Soltani, Mehrshad Eskandarpour, Amir Ahmadizad, Hossein Soleimani
arxiv.org/abs/2508.14679

@arXiv_csAI_bot@mastoxiv.page
2025-08-22 09:48:31

Search-Based Credit Assignment for Offline Preference-Based Reinforcement Learning
Xiancheng Gao, Yufeng Shi, Wengang Zhou, Houqiang Li
arxiv.org/abs/2508.15327

@arXiv_csLG_bot@mastoxiv.page
2025-09-22 10:25:51

RLinf: Flexible and Efficient Large-scale Reinforcement Learning via Macro-to-Micro Flow Transformation
Chao Yu, Yuanqing Wang, Zhen Guo, Hao Lin, Si Xu, Hongzhi Zang, Quanlu Zhang, Yongji Wu, Chunyang Zhu, Junhao Hu, Zixiao Huang, Mingjie Wei, Yuqing Xie, Ke Yang, Bo Dai, Zhexuan Xu, Xiangyuan Wang, Xu Fu, Zhihao Liu, Kang Chen, Weilin Liu, Gang Liu, Boxun Li, Jianlei Yang, Zhi Yang, Guohao Dai, Yu Wang

@arXiv_quantph_bot@mastoxiv.page
2025-08-21 10:04:20

Reinforcement learning entangling operations on spin qubits
Mohammad Abedi, Markus Schmitt
arxiv.org/abs/2508.14761 arxiv.org/pdf/2508.1476…

@arXiv_csRO_bot@mastoxiv.page
2025-08-21 09:16:49

SimGenHOI: Physically Realistic Whole-Body Humanoid-Object Interaction via Generative Modeling and Reinforcement Learning
Yuhang Lin, Yijia Xie, Jiahong Xie, Yuehao Huang, Ruoyu Wang, Jiajun Lv, Yukai Ma, Xingxing Zuo
arxiv.org/abs/2508.14120

@arXiv_physicssocph_bot@mastoxiv.page
2025-09-22 09:38:01

Hybrid Learning and Optimization methods for solving Capacitated Vehicle Routing Problem
Monit Sharma, Hoong Chuin Lau
arxiv.org/abs/2509.15262

@arXiv_csLG_bot@mastoxiv.page
2025-09-22 10:33:21

Automated Cyber Defense with Generalizable Graph-based Reinforcement Learning Agents
Isaiah J. King, Benjamin Bowman, H. Howie Huang
arxiv.org/abs/2509.16151

@arXiv_csAI_bot@mastoxiv.page
2025-08-22 10:00:51

A Dynamical Systems Framework for Reinforcement Learning Safety and Robustness Verification
Ahmed Nasir, Abdelhafid Zenati
arxiv.org/abs/2508.15588

@arXiv_csCL_bot@mastoxiv.page
2025-08-20 08:18:59

ProMed: Shapley Information Gain Guided Reinforcement Learning for Proactive Medical LLMs
Hongxin Ding, Baixiang Huang, Yue Fang, Weibin Liao, Xinke Jiang, Zheng Li, Junfeng Zhao, Yasha Wang
arxiv.org/abs/2508.13514

@arXiv_csSD_bot@mastoxiv.page
2025-09-22 09:27:11

Thinking in cocktail party: Chain-of-Thought and reinforcement learning for target speaker automatic speech recognition
Yiru Zhang, Hang Su, Lichun Fan, Zhenbo Luo, Jian Luan
arxiv.org/abs/2509.15612

@arXiv_csCV_bot@mastoxiv.page
2025-09-22 14:08:58

Replaced article(s) found for cs.CV. arxiv.org/list/cs.CV/new
[3/4]:
- cadrille: Multi-modal CAD Reconstruction with Online Reinforcement Learning
Kolodiazhnyi, Tarasov, Zhemchuzhnikov, Nikulin, Zisman, Vorontsova, Konushin, Kurenkov, Rukhovich

@arXiv_csLG_bot@mastoxiv.page
2025-08-22 10:19:51

Distributed Detection of Adversarial Attacks in Multi-Agent Reinforcement Learning with Continuous Action Space
Kiarash Kazari, Ezzeldin Shereen, Gy\"orgy D\'an
arxiv.org/abs/2508.15764

@arXiv_csAI_bot@mastoxiv.page
2025-08-22 10:02:31

Understanding Action Effects through Instrumental Empowerment in Multi-Agent Reinforcement Learning
Ardian Selmonaj, Miroslav Strupl, Oleg Szehr, Alessandro Antonucci
arxiv.org/abs/2508.15652

@arXiv_eessSY_bot@mastoxiv.page
2025-09-22 09:44:11

On-Policy Reinforcement-Learning Control for Optimal Energy Sharing and Temperature Regulation in District Heating Systems
Xinyi Yi, Ioannis Lestas
arxiv.org/abs/2509.16083

@arXiv_csRO_bot@mastoxiv.page
2025-08-21 08:38:40

No More Marching: Learning Humanoid Locomotion for Short-Range SE(2) Targets
Pranay Dugar, Mohitvishnu S. Gadde, Jonah Siekmann, Yesh Godse, Aayam Shrestha, Alan Fern
arxiv.org/abs/2508.14098

@arXiv_csIR_bot@mastoxiv.page
2025-08-21 11:52:59

Replaced article(s) found for cs.IR. arxiv.org/list/cs.IR/new
[1/1]:
- Reinforcement Learning to Rank Using Coarse-grained Rewards
Yiteng Tu, Zhichao Xu, Tao Yang, Weihang Su, Yujia Zhou, Yiqun Liu, Fen Lin, Qin Liu, Qingyao Ai

@arXiv_csMA_bot@mastoxiv.page
2025-09-19 08:23:01

LEED: A Highly Efficient and Scalable LLM-Empowered Expert Demonstrations Framework for Multi-Agent Reinforcement Learning
Tianyang Duan, Zongyuan Zhang, Songxiao Guo, Dong Huang, Yuanye Zhao, Zheng Lin, Zihan Fang, Dianxin Luan, Heming Cui, Yong Cui
arxiv.org/abs/2509.14680

@arXiv_csNI_bot@mastoxiv.page
2025-08-21 09:36:40

Adaptive Vision-Based Coverage Optimization in Mobile Wireless Sensor Networks: A Multi-Agent Deep Reinforcement Learning Approach
Parham Soltani, Mehrshad Eskandarpour, Sina Heidari, Farnaz Alizadeh, Hossein Soleimani
arxiv.org/abs/2508.14676

@arXiv_csAI_bot@mastoxiv.page
2025-08-22 10:06:31

NiceWebRL: a Python library for human subject experiments with reinforcement learning environments
Wilka Carvalho, Vikram Goddla, Ishaan Sinha, Hoon Shin, Kunal Jha
arxiv.org/abs/2508.15693

@arXiv_csLG_bot@mastoxiv.page
2025-08-21 10:15:20

PepThink-R1: LLM for Interpretable Cyclic Peptide Optimization with CoT SFT and Reinforcement Learning
Ruheng Wang, Hang Zhang, Trieu Nguyen, Shasha Feng, Hao-Wei Pang, Xiang Yu, Li Xiao, Peter Zhiping Zhang
arxiv.org/abs/2508.14765

@arXiv_qfinTR_bot@mastoxiv.page
2025-09-17 08:35:10

Reinforcement Learning-Based Market Making as a Stochastic Control on Non-Stationary Limit Order Book Dynamics
Rafael Zimmer, Oswaldo Luiz do Valle Costa
arxiv.org/abs/2509.12456

@arXiv_csCL_bot@mastoxiv.page
2025-09-19 10:30:21

Empathy-R1: A Chain-of-Empathy and Reinforcement Learning Framework for Long-Form Mental Health Support
Xianrong Yao, Dong She, Chenxu Zhang, Yimeng Zhang, Yueru Sun, Noman Ahmed, Yang Gao, Zhanpeng Jin
arxiv.org/abs/2509.14851

@arXiv_csCV_bot@mastoxiv.page
2025-08-19 12:05:10

Breaking Reward Collapse: Adaptive Reinforcement for Open-ended Medical Reasoning with Enhanced Semantic Discrimination
Yizhou Liu, Jingwei Wei, Zizhi Chen, Minghao Han, Xukun Zhang, Keliang Liu, Lihua Zhang
arxiv.org/abs/2508.12957

@arXiv_csLG_bot@mastoxiv.page
2025-09-22 10:31:31

DiffusionNFT: Online Diffusion Reinforcement with Forward Process
Kaiwen Zheng, Huayu Chen, Haotian Ye, Haoxiang Wang, Qinsheng Zhang, Kai Jiang, Hang Su, Stefano Ermon, Jun Zhu, Ming-Yu Liu
arxiv.org/abs/2509.16117

@arXiv_csAI_bot@mastoxiv.page
2025-09-22 08:14:31

CCrepairBench: A High-Fidelity Benchmark and Reinforcement Learning Framework for C Compilation Repair
Weixuan Sun, Jucai Zhai, Dengfeng Liu, Xin Zhang, Xiaojun Wu, Qiaobo Hao, AIMgroup, Yang Fang, Jiuyang Tang
arxiv.org/abs/2509.15690

@arXiv_csCL_bot@mastoxiv.page
2025-08-22 10:15:01

End-to-End Agentic RAG System Training for Traceable Diagnostic Reasoning
Qiaoyu Zheng, Yuze Sun, Chaoyi Wu, Weike Zhao, Pengcheng Qiu, Yongguo Yu, Kun Sun, Yanfeng Wang, Ya Zhang, Weidi Xie
arxiv.org/abs/2508.15746

@arXiv_quantph_bot@mastoxiv.page
2025-09-22 10:16:41

AI Methods for Permutation Circuit Synthesis Across Generic Topologies
Victor Villar, Juan Cruz-Benito, Ismael Faro, David Kremer
arxiv.org/abs/2509.16020

@arXiv_csLG_bot@mastoxiv.page
2025-09-22 10:22:01

Foundation Models as World Models: A Foundational Study in Text-Based GridWorlds
Remo Sasso, Michelangelo Conserva, Dominik Jeurissen, Paulo Rauber
arxiv.org/abs/2509.15915

@arXiv_csMA_bot@mastoxiv.page
2025-09-19 08:19:31

Constructive Conflict-Driven Multi-Agent Reinforcement Learning for Strategic Diversity
Yuxiang Mai, Qiyue Yin, Wancheng Ni, Pei Xu, Kaiqi Huang
arxiv.org/abs/2509.14276

@arXiv_csLG_bot@mastoxiv.page
2025-08-20 10:13:10

Reinforcement Learning-based Adaptive Path Selection for Programmable Networks
Jos\'e Eduardo Zerna Torres, Marios Avgeris, Chrysa Papagianni, Gergely Pongr\'acz, Istv\'an G\'odor, Paola Grosso
arxiv.org/abs/2508.13806

@arXiv_csRO_bot@mastoxiv.page
2025-09-19 10:10:51

Scalable Multi-Objective Robot Reinforcement Learning through Gradient Conflict Resolution
Humphrey Munn, Brendan Tidd, Peter B\"ohm, Marcus Gallagher, David Howard
arxiv.org/abs/2509.14816

@arXiv_csAI_bot@mastoxiv.page
2025-08-20 09:52:50

Breaking the SFT Plateau: Multimodal Structured Reinforcement Learning for Chart-to-Code Generation
Lei Chen, Xuanle Zhao, Zhixiong Zeng, Jing Huang, Liming Zheng, Yufeng Zhong, Lin Ma
arxiv.org/abs/2508.13587

@arXiv_csCV_bot@mastoxiv.page
2025-09-17 10:52:50

Perception Before Reasoning: Two-Stage Reinforcement Learning for Visual Reasoning in Vision-Language Models
Yan Chen, Long Li, Teng Xi, Long Zeng, Jingdong Wang
arxiv.org/abs/2509.13031

@arXiv_eessSY_bot@mastoxiv.page
2025-09-19 08:43:21

Digital Twin-based Cooperative Autonomous Driving in Smart Intersections: A Multi-Agent Reinforcement Learning Approach
Taoyuan Yu, Kui Wang, Zongdian Li, Tao Yu, Kei Sakaguchi, Walid Saad
arxiv.org/abs/2509.15099

@arXiv_csNI_bot@mastoxiv.page
2025-08-19 10:09:40

REACH: Reinforcement Learning for Efficient Allocation in Community and Heterogeneous Networks
Zhiwei Yu, Chengze Du, Heng Xu, Ying Zhou, Bo Liu, Jialong Li
arxiv.org/abs/2508.12857

@arXiv_csLG_bot@mastoxiv.page
2025-08-20 10:17:20

Convergent Reinforcement Learning Algorithms for Stochastic Shortest Path Problem
Soumyajit Guin, Shalabh Bhatnagar
arxiv.org/abs/2508.13963

@arXiv_csMA_bot@mastoxiv.page
2025-08-19 08:01:20

Centralized Permutation Equivariant Policy for Cooperative Multi-Agent Reinforcement Learning
Zhuofan Xu, Benedikt Bollig, Matthias F\"ugger, Thomas Nowak, Vincent Le Dr\'eau
arxiv.org/abs/2508.11706

@arXiv_csRO_bot@mastoxiv.page
2025-08-19 11:02:40

Robot Trains Robot: Automatic Real-World Policy Adaptation and Learning for Humanoids
Kaizhe Hu, Haochen Shi, Yao He, Weizhuo Wang, C. Karen Liu, Shuran Song
arxiv.org/abs/2508.12252

@arXiv_quantph_bot@mastoxiv.page
2025-09-18 10:08:31

Quantum Reinforcement Learning-Guided Diffusion Model for Image Synthesis via Hybrid Quantum-Classical Generative Model Architectures
Chi-Sheng Chen, En-Jui Kuo
arxiv.org/abs/2509.14163

@arXiv_csAI_bot@mastoxiv.page
2025-08-19 10:19:50

Wisdom of the Crowd: Reinforcement Learning from Coevolutionary Collective Feedback
Wenzhen Yuan, Shengji Tang, Weihao Lin, Jiacheng Ruan, Ganqu Cui, Bo Zhang, Tao Chen, Ting Liu, Yuzhuo Fu, Peng Ye, Lei Bai
arxiv.org/abs/2508.12338

@arXiv_csLG_bot@mastoxiv.page
2025-08-20 10:20:20

Learning from Preferences and Mixed Demonstrations in General Settings
Jason R Brown, Carl Henrik Ek, Robert D Mullins
arxiv.org/abs/2508.14027

@arXiv_csAI_bot@mastoxiv.page
2025-08-20 09:50:20

Toward Better EHR Reasoning in LLMs: Reinforcement Learning with Expert Attention Guidance
Yue Fang, Yuxin Guo, Jiaran Gao, Hongxin Ding, Xinke Jiang, Weibin Liao, Yongxin Xu, Yinghao Zhu, Zhibang Yang, Liantao Ma, Junfeng Zhao, Yasha Wang
arxiv.org/abs/2508.13579

@arXiv_csLG_bot@mastoxiv.page
2025-08-20 10:07:40

MACTAS: Self-Attention-Based Module for Inter-Agent Communication in Multi-Agent Reinforcement Learning
Maciej Wojtala, Bogusz Stefa\'nczyk, Dominik Bogucki, {\L}ukasz Lepak, Jakub Strykowski, Pawe{\l} Wawrzy\'nski
arxiv.org/abs/2508.13661

@arXiv_csRO_bot@mastoxiv.page
2025-08-21 09:15:39

Efficient Environment Design for Multi-Robot Navigation via Continuous Control
Jahid Chowdhury Choton, John Woods, William Hsu
arxiv.org/abs/2508.14105

@arXiv_eessSY_bot@mastoxiv.page
2025-09-19 07:55:51

Near-Real-Time Resource Slicing for QoS Optimization in 5G O-RAN using Deep Reinforcement Learning
Peihao Yan, Jie Lu, Huacheng Zeng, Y. Thomas Hou
arxiv.org/abs/2509.14343

@arXiv_csMA_bot@mastoxiv.page
2025-09-19 08:24:21

Vulnerable Agent Identification in Large-Scale Multi-Agent Reinforcement Learning
Simin Li, Zheng Yuwei, Zihao Mao, Linhao Wang, Ruixiao Xu, Chengdong Ma, Xin Yu, Yuqing Ma, Qi Dou, Xin Wang, Jie Luo, Bo An, Yaodong Yang, Weifeng Lv, Xianglong Liu
arxiv.org/abs/2509.15103

@arXiv_csCV_bot@mastoxiv.page
2025-10-15 10:44:21

CoIRL-AD: Collaborative-Competitive Imitation-Reinforcement Learning in Latent World Models for Autonomous Driving
Xiaoji Zheng, Ziyuan Yang, Yanhao Chen, Yuhang Peng, Yuanrong Tang, Gengyuan Liu, Bokui Chen, Jiangtao Gong
arxiv.org/abs/2510.12560

@arXiv_csRO_bot@mastoxiv.page
2025-09-22 09:41:01

Momentum-constrained Hybrid Heuristic Trajectory Optimization Framework with Residual-enhanced DRL for Visually Impaired Scenarios
Yuting Zeng, Zhiwen Zheng, You Zhou, JiaLing Xiao, Yongbin Yu, Manping Fan, Bo Gong, Liyong Ren
arxiv.org/abs/2509.15582

@arXiv_csAI_bot@mastoxiv.page
2025-08-19 10:53:40

Reinforcement Learning with Rubric Anchors
Zenan Huang, Yihong Zhuang, Guoshan Lu, Zeyu Qin, Haokai Xu, Tianyu Zhao, Ru Peng, Jiaqi Hu, Zhanming Shen, Xiaomeng Hu, Xijun Gu, Peiyi Tu, Jiaxin Liu, Wenyu Chen, Yuzhuo Fu, Zhiting Fan, Yanmei Gu, Yuanyuan Wang, Zhengkai Yang, Jianguo Li, Junbo Zhao
arxiv.org/abs/2508.12790

@arXiv_csLG_bot@mastoxiv.page
2025-08-20 10:16:20

Categorical Policies: Multimodal Policy Learning and Exploration in Continuous Control
SM Mazharul Islam, Manfred Huber
arxiv.org/abs/2508.13922

@arXiv_csLG_bot@mastoxiv.page
2025-08-20 10:15:40

Revisiting Diffusion Q-Learning: From Iterative Denoising to One-Step Action Generation
Thanh Nguyen, Chang D. Yoo
arxiv.org/abs/2508.13904

@arXiv_csAI_bot@mastoxiv.page
2025-08-19 10:00:30

RLNVR: Reinforcement Learning from Non-Verified Real-World Rewards
Rohit Krishnan, Jon Evans
arxiv.org/abs/2508.12165 arxiv.org/pdf/2508.12…

@arXiv_csRO_bot@mastoxiv.page
2025-08-18 08:35:20

GenFlowRL: Shaping Rewards with Generative Object-Centric Flow in Visual Reinforcement Learning
Kelin Yu, Sheng Zhang, Harshit Soora, Furong Huang, Heng Huang, Pratap Tokekar, Ruohan Gao
arxiv.org/abs/2508.11049

@arXiv_csAI_bot@mastoxiv.page
2025-08-20 10:12:10

ComputerRL: Scaling End-to-End Online Reinforcement Learning for Computer Use Agents
Hanyu Lai, Xiao Liu, Yanxiao Zhao, Han Xu, Hanchen Zhang, Bohao Jing, Yanyu Ren, Shuntian Yao, Yuxiao Dong, Jie Tang
arxiv.org/abs/2508.14040

@arXiv_csRO_bot@mastoxiv.page
2025-08-19 11:11:20

SIGN: Safety-Aware Image-Goal Navigation for Autonomous Drones via Reinforcement Learning
Zichen Yan, Rui Huang, Lei He, Shao Guo, Lin Zhao
arxiv.org/abs/2508.12394

@arXiv_csAI_bot@mastoxiv.page
2025-08-19 11:10:20

OPTIC-ER: A Reinforcement Learning Framework for Real-Time Emergency Response and Equitable Resource Allocation in Underserved African Communities
Mary Tonwe
arxiv.org/abs/2508.12943

@arXiv_csAI_bot@mastoxiv.page
2025-09-18 08:19:31

$Agent^2$: An Agent-Generates-Agent Framework for Reinforcement Learning Automation
Yuan Wei, Xiaohan Shan, Ran Miao, Jianmin Li
arxiv.org/abs/2509.13368

@arXiv_csLG_bot@mastoxiv.page
2025-09-18 10:14:11

TGPO: Tree-Guided Preference Optimization for Robust Web Agent Reinforcement Learning
Ziyuan Chen, Zhenghui Zhao, Zhangye Han, Miancan Liu, Xianhang Ye, Yiqing Li, Hongbo Min, Jinkui Ren, Xiantao Zhang, Guitao Cao
arxiv.org/abs/2509.14172

@arXiv_csRO_bot@mastoxiv.page
2025-09-18 10:07:51

Reinforcement Learning for Autonomous Point-to-Point UAV Navigation
Salim Oyinlola, Nitesh Subedi, Soumik Sarkar
arxiv.org/abs/2509.13943 a…

@arXiv_csLG_bot@mastoxiv.page
2025-08-21 10:17:00

Universal and Transferable Adversarial Attack on Large Language Models Using Exponentiated Gradient Descent
Sajib Biswas, Mao Nishino, Samuel Jacob Chacko, Xiuwen Liu
arxiv.org/abs/2508.14853

@arXiv_csRO_bot@mastoxiv.page
2025-08-18 09:07:50

Actor-Critic for Continuous Action Chunks: A Reinforcement Learning Framework for Long-Horizon Robotic Manipulation with Sparse Reward
Jiarui Yang, Bin Zhu, Jingjing Chen, Yu-Gang Jiang
arxiv.org/abs/2508.11143

@arXiv_csLG_bot@mastoxiv.page
2025-09-22 10:22:21

Improving Monte Carlo Tree Search for Symbolic Regression
Zhengyao Huang, Daniel Zhengyu Huang, Tiannan Xiao, Dina Ma, Zhenyu Ming, Hao Shi, Yuanhui Wen
arxiv.org/abs/2509.15929

@arXiv_csAI_bot@mastoxiv.page
2025-09-19 09:13:31

RationAnomaly: Log Anomaly Detection with Rationality via Chain-of-Thought and Reinforcement Learning
Song Xu, Yilun Liu, Minggui He, Mingchen Dai, Ziang Chen, Chunguang Zhao, Jingzhou Du, Shimin Tao, Weibin Meng, Shenglin Zhang, Yongqian Sun, Boxing Chen, Daimeng Wei
arxiv.org/abs/2509.14693

@arXiv_csRO_bot@mastoxiv.page
2025-08-15 08:50:22

Few-shot Vision-based Human Activity Recognition with MLLM-based Visual Reinforcement Learning
Wenqi Zheng, Yutaka Arakawa
arxiv.org/abs/2508.10371

@arXiv_csLG_bot@mastoxiv.page
2025-08-21 10:17:30

Compute-Optimal Scaling for Value-Based Deep RL
Preston Fu, Oleh Rybkin, Zhiyuan Zhou, Michal Nauman, Pieter Abbeel, Sergey Levine, Aviral Kumar
arxiv.org/abs/2508.14881

@arXiv_csLG_bot@mastoxiv.page
2025-08-21 10:08:50

Beyond ReLU: Chebyshev-DQN for Enhanced Deep Q-Networks
Saman Yazdannik, Morteza Tayefi, Shamim Sanisales
arxiv.org/abs/2508.14536 arxiv.or…

@arXiv_csAI_bot@mastoxiv.page
2025-08-19 10:31:40

The Yokai Learning Environment: Tracking Beliefs Over Space and Time
Constantin Ruhdorfer, Matteo Bortoletto, Andreas Bulling
arxiv.org/abs/2508.12480

@arXiv_csRO_bot@mastoxiv.page
2025-09-18 10:10:41

SEG-Parking: Towards Safe, Efficient, and Generalizable Autonomous Parking via End-to-End Offline Reinforcement Learning
Zewei Yang, Zengqi Peng, Jun Ma
arxiv.org/abs/2509.13956

@arXiv_csLG_bot@mastoxiv.page
2025-09-22 10:22:11

Enhancing Generative Auto-bidding with Offline Reward Evaluation and Policy Search
Zhiyu Mou, Yiqin Lv, Miao Xu, Cheems Wang, Yixiu Mao, Qichen Ye, Chao Li, Rongquan Bai, Chuan Yu, Jian Xu, Bo Zheng
arxiv.org/abs/2509.15927

@arXiv_csRO_bot@mastoxiv.page
2025-08-19 11:29:30

Manipulate-to-Navigate: Reinforcement Learning with Visual Affordances and Manipulability Priors
Yuying Zhang, Joni Pajarinen
arxiv.org/abs/2508.13151

@arXiv_csLG_bot@mastoxiv.page
2025-08-21 10:15:10

HERAKLES: Hierarchical Skill Compilation for Open-ended LLM Agents
Thomas Carta, Cl\'ement Romac, Loris Gaven, Pierre-Yves Oudeyer, Olivier Sigaud, Sylvain Lamprier
arxiv.org/abs/2508.14751

@arXiv_csLG_bot@mastoxiv.page
2025-08-21 10:13:50

AFABench: A Generic Framework for Benchmarking Active Feature Acquisition
Valter Sch\"utz, Han Wu, Reza Rezvan, Linus Aronsson, Morteza Haghir Chehreghani
arxiv.org/abs/2508.14734

@arXiv_csAI_bot@mastoxiv.page
2025-08-19 11:08:40

Towards Open-Ended Emotional Support Conversations in LLMs via Reinforcement Learning with Future-Oriented Rewards
Ting Yang, Li Chen, Huimin Wang
arxiv.org/abs/2508.12935

@arXiv_csRO_bot@mastoxiv.page
2025-09-16 11:25:26

Quantum deep reinforcement learning for humanoid robot navigation task
Romerik Lokossou, Birhanu Shimelis Girma, Ozan K. Tonguz, Ahmed Biyabani
arxiv.org/abs/2509.11388

@arXiv_csLG_bot@mastoxiv.page
2025-09-18 10:15:41

A Universal Banach--Bregman Framework for Stochastic Iterations: Unifying Stochastic Mirror Descent, Learning and LLM Training
Johnny R. Zhang (Independent Researcher), Xiaomei Mi (University of Manchester), Gaoyuan Du (Amazon), Qianyi Sun (Microsoft), Shiqi Wang (Meta), Jiaxuan Li (Amazon), Wenhua Zhou (Independent Researcher)
arx…

@arXiv_csRO_bot@mastoxiv.page
2025-08-18 09:09:00

Multi-Group Equivariant Augmentation for Reinforcement Learning in Robot Manipulation
Hongbin Lin, Juan Rojas, Kwok Wai Samuel Au
arxiv.org/abs/2508.11204

@arXiv_csAI_bot@mastoxiv.page
2025-10-15 10:22:21

ERA: Transforming VLMs into Embodied Agents via Embodied Prior Learning and Online Reinforcement Learning
Hanyang Chen, Mark Zhao, Rui Yang, Qinwei Ma, Ke Yang, Jiarui Yao, Kangrui Wang, Hao Bai, Zhenhailong Wang, Rui Pan, Mengchao Zhang, Jose Barreiros, Aykut Onol, ChengXiang Zhai, Heng Ji, Manling Li, Huan Zhang, Tong Zhang
arxiv…

@arXiv_csLG_bot@mastoxiv.page
2025-09-18 10:11:21

Online Bayesian Risk-Averse Reinforcement Learning
Yuhao Wang, Enlu Zhou
arxiv.org/abs/2509.14077 arxiv.org/pdf/2509.14077

@arXiv_csLG_bot@mastoxiv.page
2025-08-18 09:41:10

Fusing Rewards and Preferences in Reinforcement Learning
Sadegh Khorasani, Saber Salehkaleybar, Negar Kiyavash, Matthias Grossglauser
arxiv.org/abs/2508.11363

@arXiv_csRO_bot@mastoxiv.page
2025-09-18 10:08:41

SHaRe-RL: Structured, Interactive Reinforcement Learning for Contact-Rich Industrial Assembly Tasks
Jannick Strangh\"oner, Philipp Hartmann, Marco Braun, Sebastian Wrede, Klaus Neumann
arxiv.org/abs/2509.13949

@arXiv_csLG_bot@mastoxiv.page
2025-08-18 09:43:10

On-Policy RL Meets Off-Policy Experts: Harmonizing Supervised Fine-Tuning and Reinforcement Learning via Dynamic Weighting
Wenhao Zhang, Yuexiang Xie, Yuchang Sun, Yanxi Chen, Guoyin Wang, Yaliang Li, Bolin Ding, Jingren Zhou
arxiv.org/abs/2508.11408

@arXiv_csRO_bot@mastoxiv.page
2025-09-17 10:31:10

GRATE: a Graph transformer-based deep Reinforcement learning Approach for Time-efficient autonomous robot Exploration
Haozhan Ni, Jingsong Liang, Chenyu He, Yuhong Cao, Guillaume Sartoretti
arxiv.org/abs/2509.12863

@arXiv_csLG_bot@mastoxiv.page
2025-08-18 09:39:20

ETTRL: Balancing Exploration and Exploitation in LLM Test-Time Reinforcement Learning Via Entropy Mechanism
Jia Liu, ChangYi He, YingQiao Lin, MingMin Yang, FeiYang Shen, ShaoGuo Liu, TingTing Gao
arxiv.org/abs/2508.11356

@arXiv_csRO_bot@mastoxiv.page
2025-09-17 10:22:40

Integrating Trajectory Optimization and Reinforcement Learning for Quadrupedal Jumping with Terrain-Adaptive Landing
Renjie Wang, Shangke Lyu, Xin Lang, Wei Xiao, Donglin Wang
arxiv.org/abs/2509.12776

@arXiv_csLG_bot@mastoxiv.page
2025-10-14 13:37:08

How Reinforcement Learning After Next-Token Prediction Facilitates Learning
Nikolaos Tsilivis, Eran Malach, Karen Ullrich, Julia Kempe
arxiv.org/abs/2510.11495

@arXiv_csLG_bot@mastoxiv.page
2025-08-15 10:12:02

Variance Reduced Policy Gradient Method for Multi-Objective Reinforcement Learning
Davide Guidobene, Lorenzo Benedetti, Diego Arapovic
arxiv.org/abs/2508.10608

@arXiv_csLG_bot@mastoxiv.page
2025-10-15 08:21:22

GAR: Generative Adversarial Reinforcement Learning for Formal Theorem Proving
Ruida Wang, Jiarui Yao, Rui Pan, Shizhe Diao, Tong Zhang
arxiv.org/abs/2510.11769

@arXiv_csLG_bot@mastoxiv.page
2025-08-12 12:07:03

Stackelberg Coupling of Online Representation Learning and Reinforcement Learning
Fernando Martinez, Tao Li, Yingdong Lu, Juntao Chen
arxiv.org/abs/2508.07452

@arXiv_csLG_bot@mastoxiv.page
2025-09-12 09:53:39

Quantum Machine Learning, Quantitative Trading, Reinforcement Learning, Deep Learning
Jun-Hao Chen, Yu-Chien Huang, Yun-Cheng Tsai, Samuel Yen-Chi Chen
arxiv.org/abs/2509.09176

@arXiv_csLG_bot@mastoxiv.page
2025-09-16 12:45:07

$K$-Level Policy Gradients for Multi-Agent Reinforcement Learning
Aryaman Reddi, Gabriele Tiboni, Jan Peters, Carlo D'Eramo
arxiv.org/abs/2509.12117

@arXiv_csLG_bot@mastoxiv.page
2025-09-16 12:40:37

Generalizing Behavior via Inverse Reinforcement Learning with Closed-Form Reward Centroids
Filippo Lazzati, Alberto Maria Metelli
arxiv.org/abs/2509.12010

@arXiv_csLG_bot@mastoxiv.page
2025-08-20 10:12:20

Depth-Breadth Synergy in RLVR: Unlocking LLM Reasoning Gains with Adaptive Exploration
Zhicheng Yang, Zhijiang Guo, Yinya Huang, Yongxin Wang, Dongchun Xie, Yiwei Wang, Xiaodan Liang, Jing Tang
arxiv.org/abs/2508.13755