Tootfinder

Opt-in global Mastodon full text search. Join the index!

@arXiv_csAI_bot@mastoxiv.page
2025-10-15 10:22:21

ERA: Transforming VLMs into Embodied Agents via Embodied Prior Learning and Online Reinforcement Learning
Hanyang Chen, Mark Zhao, Rui Yang, Qinwei Ma, Ke Yang, Jiarui Yao, Kangrui Wang, Hao Bai, Zhenhailong Wang, Rui Pan, Mengchao Zhang, Jose Barreiros, Aykut Onol, ChengXiang Zhai, Heng Ji, Manling Li, Huan Zhang, Tong Zhang
arxiv…

@arXiv_csLG_bot@mastoxiv.page
2025-10-15 08:21:22

GAR: Generative Adversarial Reinforcement Learning for Formal Theorem Proving
Ruida Wang, Jiarui Yao, Rui Pan, Shizhe Diao, Tong Zhang
arxiv.org/abs/2510.11769

@Techmeme@techhub.social
2025-10-17 18:23:34

Q&A with Andrej Karpathy on AGI still being a decade away, why reinforcement learning is terrible, superintelligence, his AI education startup Eureka, and more (Dwarkesh Patel/Dwarkesh Podcast)
dwarkesh.com/p/andrej-karpathy

@arXiv_csCV_bot@mastoxiv.page
2025-10-15 10:44:21

CoIRL-AD: Collaborative-Competitive Imitation-Reinforcement Learning in Latent World Models for Autonomous Driving
Xiaoji Zheng, Ziyuan Yang, Yanhao Chen, Yuhang Peng, Yuanrong Tang, Gengyuan Liu, Bokui Chen, Jiangtao Gong
arxiv.org/abs/2510.12560

@arXiv_csRO_bot@mastoxiv.page
2025-10-15 09:41:52

Robot Learning: A Tutorial
Francesco Capuano, Caroline Pascal, Adil Zouitine, Thomas Wolf, Michel Aractingi
arxiv.org/abs/2510.12403 arxiv.…

@arXiv_csCL_bot@mastoxiv.page
2025-10-14 13:08:08

Stabilizing MoE Reinforcement Learning by Aligning Training and Inference Routers
Wenhan Ma, Hailin Zhang, Liang Zhao, Yifan Song, Yudong Wang, Zhifang Sui, Fuli Luo
arxiv.org/abs/2510.11370

@arXiv_eessSY_bot@mastoxiv.page
2025-10-15 08:05:41

Physics-Informed Reinforcement Learning for Large-Scale EV Smart Charging Considering Distribution Network Voltage Constraints
Stavros Orfanoudakis, Frans Oliehoek, Peter Palesnky, Pedro P. Vergara
arxiv.org/abs/2510.12335

@arXiv_mathOC_bot@mastoxiv.page
2025-10-15 09:36:42

Learning Mean-Field Games through Mean-Field Actor-Critic Flow
Mo Zhou, Haosheng Zhou, Ruimeng Hu
arxiv.org/abs/2510.12180 arxiv.org/pdf/25…

@arXiv_csLG_bot@mastoxiv.page
2025-10-14 13:37:08

How Reinforcement Learning After Next-Token Prediction Facilitates Learning
Nikolaos Tsilivis, Eran Malach, Karen Ullrich, Julia Kempe
arxiv.org/abs/2510.11495

@arXiv_csMA_bot@mastoxiv.page
2025-10-15 08:03:51

Empirical Study on Robustness and Resilience in Cooperative Multi-Agent Reinforcement Learning
Simin Li, Zihao Mao, Hanxiao Li, Zonglei Jing, Zhuohang bian, Jun Guo, Li Wang, Zhuoran Han, Ruixiao Xu, Xin Yu, Chengdong Ma, Yuqing Ma, Bo An, Yaodong Yang, Weifeng Lv, Xianglong Liu
arxiv.org/abs/2510.11824

@arXiv_csAI_bot@mastoxiv.page
2025-10-15 10:18:51

Inclusive Fitness as a Key Step Towards More Advanced Social Behaviors in Multi-Agent Reinforcement Learning Settings
Andries Rosseau, Rapha\"el Avalos, Ann Now\'e
arxiv.org/abs/2510.12555

@arXiv_csRO_bot@mastoxiv.page
2025-10-15 09:22:41

Pretraining in Actor-Critic Reinforcement Learning for Robot Motion Control
Jiale Fan, Andrei Cramariuc, Tifanny Portela, Marco Hutter
arxiv.org/abs/2510.12363

@arXiv_csLG_bot@mastoxiv.page
2025-10-15 10:47:41

Expert or not? assessing data quality in offline reinforcement learning
Arip Asadulaev, Fakhri Karray, Martin Takac
arxiv.org/abs/2510.12638

@arXiv_csCL_bot@mastoxiv.page
2025-10-14 13:19:08

Demystifying Reinforcement Learning in Agentic Reasoning
Zhaochen Yu, Ling Yang, Jiaru Zou, Shuicheng Yan, Mengdi Wang
arxiv.org/abs/2510.11701

@arXiv_csRO_bot@mastoxiv.page
2025-10-15 10:12:01

Residual MPC: Blending Reinforcement Learning with GPU-Parallelized Model Predictive Control
Se Hwan Jeon, Ho Jae Lee, Seungwoo Hong, Sangbae Kim
arxiv.org/abs/2510.12717

@arXiv_csNI_bot@mastoxiv.page
2025-10-14 10:44:08

A Flexible Multi-Agent Deep Reinforcement Learning Framework for Dynamic Routing and Scheduling of Latency-Critical Services
Vincenzo Norman Vitale, Antonia Maria Tulino, Andreas F. Molisch, Jaime Llorca
arxiv.org/abs/2510.11535

@arXiv_mathOC_bot@mastoxiv.page
2025-10-14 09:34:58

Robust Exploratory Stopping under Ambiguity in Reinforcement Learning
Junyan Ye, Hoi Ying Wong, Kyunghyun Park
arxiv.org/abs/2510.10260 arx…

@arXiv_csCV_bot@mastoxiv.page
2025-10-13 10:25:00

Spotlight on Token Perception for Multimodal Reinforcement Learning
Siyuan Huang, Xiaoye Qu, Yafu Li, Yun Luo, Zefeng He, Daizong Liu, Yu Cheng
arxiv.org/abs/2510.09285

@arXiv_qbioNC_bot@mastoxiv.page
2025-12-11 08:16:21

Meta-learning three-factor plasticity rules for structured credit assignment with sparse feedback
Dimitra Maoutsa
arxiv.org/abs/2512.09366 arxiv.org/pdf/2512.09366 arxiv.org/html/2512.09366
arXiv:2512.09366v1 Announce Type: new
Abstract: Biological neural networks learn complex behaviors from sparse, delayed feedback using local synaptic plasticity, yet the mechanisms enabling structured credit assignment remain elusive. In contrast, artificial recurrent networks solving similar tasks typically rely on biologically implausible global learning rules or hand-crafted local updates. The space of local plasticity rules capable of supporting learning from delayed reinforcement remains largely unexplored. Here, we present a meta-learning framework that discovers local learning rules for structured credit assignment in recurrent networks trained with sparse feedback. Our approach interleaves local neo-Hebbian-like updates during task execution with an outer loop that optimizes plasticity parameters via \textbf{tangent-propagation through learning}. The resulting three-factor learning rules enable long-timescale credit assignment using only local information and delayed rewards, offering new insights into biologically grounded mechanisms for learning in recurrent circuits.
toXiv_bot_toot

@arXiv_csAI_bot@mastoxiv.page
2025-10-15 09:51:01

$\mathbf{T^3}$: Reducing Belief Deviation in Reinforcement Learning for Active Reasoning
Deyu Zou, Yongqiang Chen, Jianxiang Wang, Haochen Yang, Mufei Li, James Cheng, Pan Li, Yu Gong
arxiv.org/abs/2510.12264

@arXiv_csMA_bot@mastoxiv.page
2025-10-14 08:03:56

Structured Cooperative Multi-Agent Reinforcement Learning: a Bayesian Network Perspective
Shahbaz P Qadri Syed, He Bai
arxiv.org/abs/2510.09937

@arXiv_csRO_bot@mastoxiv.page
2025-10-15 09:49:31

A Task-Efficient Reinforcement Learning Task-Motion Planner for Safe Human-Robot Cooperation
Gaoyuan Liu, Joris de Winter, Kelly Merckaert, Denis Steckelmacher, Ann Nowe, Bram Vanderborght
arxiv.org/abs/2510.12477

@arXiv_statML_bot@mastoxiv.page
2025-10-09 09:23:41

PyCFRL: A Python library for counterfactually fair offline reinforcement learning via sequential data preprocessing
Jianhan Zhang, Jitao Wang, Chengchun Shi, John D. Piette, Donglin Zeng, Zhenke Wu
arxiv.org/abs/2510.06935

@arXiv_csIR_bot@mastoxiv.page
2025-10-15 08:48:12

Reinforced Preference Optimization for Recommendation
Junfei Tan, Yuxin Chen, An Zhang, Junguang Jiang, Bin Liu, Ziru Xu, Han Zhu, Jian Xu, Bo Zheng, Xiang Wang
arxiv.org/abs/2510.12211

@arXiv_csCL_bot@mastoxiv.page
2025-10-15 10:44:11

Reasoning Pattern Matters: Learning to Reason without Human Rationales
Chaoxu Pang, Yixuan Cao, Ping Luo
arxiv.org/abs/2510.12643 arxiv.org…

@arXiv_eessSP_bot@mastoxiv.page
2025-10-10 08:31:48

Utilizing Model-Free Reinforcement Learning for Optimizing Secure Multi-Party Computation Protocols
Javad Sayyadi, Mahdi Nangir, Mahmood Mohassel Feghhi, Hamid Sayyadi
arxiv.org/abs/2510.07814

@arXiv_csSE_bot@mastoxiv.page
2025-10-08 08:23:29

Adaptive Reinforcement Learning for Dynamic Configuration Allocation in Pre-Production Testing
Yu Zhu
arxiv.org/abs/2510.05147 arxiv.org/pd…

@arXiv_csHC_bot@mastoxiv.page
2025-10-14 11:12:58

Assessing Policy Updates: Toward Trust-Preserving Intelligent User Interfaces
Matan Solomon, Ofra Amir, Omer Ben-Porat
arxiv.org/abs/2510.10616

@arXiv_csLG_bot@mastoxiv.page
2025-10-14 13:38:08

Offline Reinforcement Learning with Generative Trajectory Policies
Xinsong Feng, Leshu Tang, Chenan Wang, Haipeng Chen
arxiv.org/abs/2510.11499

@arXiv_csCE_bot@mastoxiv.page
2025-10-09 07:31:50

Attention-Enhanced Reinforcement Learning for Dynamic Portfolio Optimization
Pei Xue, Yuanchun Ye
arxiv.org/abs/2510.06466 arxiv.org/pdf/25…

@arXiv_eessSY_bot@mastoxiv.page
2025-10-14 11:43:38

A Physics-Informed Reinforcement Learning Approach for Degradation-Aware Long-Term Charging Optimization in Batteries
Shanthan Kumar Padisala, Bharatkumar Hegde, Ibrahim Haskara, Satadru Dey
arxiv.org/abs/2510.11515

@arXiv_csLG_bot@mastoxiv.page
2025-10-14 13:42:39

QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs
Wei Huang, Yi Ge, Shuai Yang, Yicheng Xiao, Huizi Mao, Yujun Lin, Hanrong Ye, Sifei Liu, Ka Chun Cheung, Hongxu Yin, Yao Lu, Xiaojuan Qi, Song Han, Yukang Chen
arxiv.org/abs/2510.11696

@arXiv_csMA_bot@mastoxiv.page
2025-10-14 09:40:58

Autonomous vehicles need social awareness to find optima in multi-agent reinforcement learning routing games
Anastasia Psarou, {\L}ukasz Gorczyca, Dominik Gawe{\l}, Rafa{\l} Kucharski
arxiv.org/abs/2510.11410

@arXiv_csRO_bot@mastoxiv.page
2025-10-14 12:36:08

NaviGait: Navigating Dynamically Feasible Gait Libraries using Deep Reinforcement Learning
Neil C. Janwani, Varun Madabushi, Maegan Tucker
arxiv.org/abs/2510.11542

@arXiv_csLG_bot@mastoxiv.page
2025-10-14 13:38:18

Context-Aware Model-Based Reinforcement Learning for Autonomous Racing
Emran Yasser Moustafa, Ivana Dusparic
arxiv.org/abs/2510.11501 arxiv…

@arXiv_statML_bot@mastoxiv.page
2025-10-09 09:44:11

Diffusion-Augmented Reinforcement Learning for Robust Portfolio Optimization under Stress Scenarios
Himanshu Choudhary, Arishi Orra, Manoj Thakur
arxiv.org/abs/2510.07099

@arXiv_csRO_bot@mastoxiv.page
2025-10-15 10:08:41

Autonomous Legged Mobile Manipulation for Lunar Surface Operations via Constrained Reinforcement Learning
Alvaro Belmonte-Baeza, Miguel Cazorla, Gabriel J. Garc\'ia, Carlos J. P\'erez-Del-Pulgar, Jorge Pomares
arxiv.org/abs/2510.12684

@arXiv_csMA_bot@mastoxiv.page
2025-10-15 08:07:51

Heterogeneous RBCs via deep multi-agent reinforcement learning
Federico Gabriele, Aldo Glielmo, Marco Taboga
arxiv.org/abs/2510.12272 arxiv…

@arXiv_csCV_bot@mastoxiv.page
2025-10-09 10:26:11

TTRV: Test-Time Reinforcement Learning for Vision Language Models
Akshit Singh, Shyam Marjit, Wei Lin, Paul Gavrikov, Serena Yeung-Levy, Hilde Kuehne, Rogerio Feris, Sivan Doveh, James Glass, M. Jehanzeb Mirza
arxiv.org/abs/2510.06783

@arXiv_csAI_bot@mastoxiv.page
2025-10-15 10:10:21

Biased-Attention Guided Risk Prediction for Safe Decision-Making at Unsignalized Intersections
Chengyang Dong, Nan Guo
arxiv.org/abs/2510.12428

@arXiv_csCL_bot@mastoxiv.page
2025-10-15 14:19:21

Replaced article(s) found for cs.CL. arxiv.org/list/cs.CL/new
[2/5]:
- DoctorAgent-RL: A Multi-Agent Collaborative Reinforcement Learning System for Multi-Turn Clinical...
Yichun Feng, Jiawei Wang, Lu Zhou, Zhen Lei, Yixue Li

@arXiv_csRO_bot@mastoxiv.page
2025-10-13 09:35:20

Model-Based Lookahead Reinforcement Learning for in-hand manipulation
Alexandre Lopes, Catarina Barata, Plinio Moreno
arxiv.org/abs/2510.08884

@arXiv_statML_bot@mastoxiv.page
2025-10-09 08:39:31

Online Matching via Reinforcement Learning: An Expert Policy Orchestration Strategy
Chiara Mignacco, Matthieu Jonckheere, Gilles Stoltz
arxiv.org/abs/2510.06515

@arXiv_csLG_bot@mastoxiv.page
2025-10-10 10:59:59

Expressive Value Learning for Scalable Offline Reinforcement Learning
Nicolas Espinosa-Dice, Kiante Brantley, Wen Sun
arxiv.org/abs/2510.08218

@arXiv_csRO_bot@mastoxiv.page
2025-10-13 10:02:50

Obstacle Avoidance using Dynamic Movement Primitives and Reinforcement Learning
Dominik Urbaniak, Alejandro Agostini, Pol Ramon, Jan Rosell, Ra\'ul Su\'arez, Michael Suppa
arxiv.org/abs/2510.09254

@arXiv_csCV_bot@mastoxiv.page
2025-10-09 10:38:01

No MoCap Needed: Post-Training Motion Diffusion Models with Reinforcement Learning using Only Textual Prompts
Girolamo Macaluso, Lorenzo Mandelli, Mirko Bicchierai, Stefano Berretti, Andrew D. Bagdanov
arxiv.org/abs/2510.06988

@arXiv_eessSY_bot@mastoxiv.page
2025-10-09 08:33:30

Falsification-Driven Reinforcement Learning for Maritime Motion Planning
Marlon M\"uller, Florian Finkeldei, Hanna Krasowski, Murat Arcak, Matthias Althoff
arxiv.org/abs/2510.06970

@arXiv_csAI_bot@mastoxiv.page
2025-10-10 07:32:39

L2M-AID: Autonomous Cyber-Physical Defense by Fusing Semantic Reasoning of Large Language Models with Multi-Agent Reinforcement Learning (Preprint)
Tianxiang Xu, Zhichao Wen, Xinyu Zhao, Jun Wang, Yan Li, Chang Liu
arxiv.org/abs/2510.07363

@arXiv_csLG_bot@mastoxiv.page
2025-10-13 10:37:20

FM-IRL: Flow-Matching for Reward Modeling and Policy Regularization in Reinforcement Learning
Zhenglin Wan, Jingxuan Wu, Xingrui Yu, Chubin Zhang, Mingcong Lei, Bo An, Ivor Tsang
arxiv.org/abs/2510.09222

@arXiv_csRO_bot@mastoxiv.page
2025-10-13 09:04:40

CDE: Concept-Driven Exploration for Reinforcement Learning
Le Mao, Andrew H. Liu, Renos Zabounidis, Zachary Kingston, Joseph Campbell
arxiv.org/abs/2510.08851

@arXiv_csCV_bot@mastoxiv.page
2025-10-10 11:11:09

Gaze on the Prize: Shaping Visual Attention with Return-Guided Contrastive Learning
Andrew Lee, Ian Chuang, Dechen Gao, Kai Fukazawa, Iman Soltani
arxiv.org/abs/2510.08442

@arXiv_csLG_bot@mastoxiv.page
2025-10-08 10:55:39

From Learning to Mastery: Achieving Safe and Efficient Real-World Autonomous Driving with Human-In-The-Loop Reinforcement Learning
Li Zeqiao, Wang Yijing, Wang Haoyu, Li Zheng, Li Peng, Liu Wenfei, Zuo Zhiqiang
arxiv.org/abs/2510.06038

@arXiv_csAI_bot@mastoxiv.page
2025-10-06 09:46:49

Consolidating Reinforcement Learning for Multimodal Discrete Diffusion Models
Tianren Ma, Mu Zhang, Yibing Wang, Qixiang Ye
arxiv.org/abs/2510.02880

@arXiv_csCL_bot@mastoxiv.page
2025-10-14 16:11:47

Crosslisted article(s) found for cs.CL. arxiv.org/list/cs.CL/new
[3/3]:
- QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs
Huang, Ge, Yang, Xiao, Mao, Lin, Ye, Liu, Cheung, Yin, Lu, Qi, Han, Chen

@arXiv_csLG_bot@mastoxiv.page
2025-10-14 22:17:48

Replaced article(s) found for cs.LG. arxiv.org/list/cs.LG/new
[3/14]:
- OrbitZoo: Multi-Agent Reinforcement Learning Environment for Orbital Dynamics
Alexandre Oliveira, Katarina Dyreby, Francisco Caldas, Cl\'audia Soares

@arXiv_csLG_bot@mastoxiv.page
2025-10-15 10:46:41

Laminar: A Scalable Asynchronous RL Post-Training Framework
Guangming Sheng, Yuxuan Tong, Borui Wan, Wang Zhang, Chaobo Jia, Xibin Wu, Yuqi Wu, Xiang Li, Chi Zhang, Yanghua Peng, Haibin Lin, Xin Liu, Chuan Wu
arxiv.org/abs/2510.12633

@arXiv_csCL_bot@mastoxiv.page
2025-10-14 13:08:58

KnowRL: Teaching Language Models to Know What They Know
Sahil Kale, Devendra Singh Dhami
arxiv.org/abs/2510.11407 arxiv.org/pdf/2510.11407

@arXiv_csAI_bot@mastoxiv.page
2025-10-14 12:23:38

From <Answer> to <Think>: Multidimensional Supervision of Reasoning Process for LLM Optimization
Beining Wang, Weihang Su, Hongtao Tian, Tao Yang, Yujia Zhou, Ting Yao, Qingyao Ai, Yiqun Liu
arxiv.org/abs/2510.11457

@arXiv_csRO_bot@mastoxiv.page
2025-10-08 10:06:39

Oracle-Guided Masked Contrastive Reinforcement Learning for Visuomotor Policies
Yuhang Zhang, Jiaping Xiao, Chao Yan, Mir Feroskhan
arxiv.org/abs/2510.05692

@arXiv_csRO_bot@mastoxiv.page
2025-10-14 12:40:18

Phys2Real: Fusing VLM Priors with Interactive Online Adaptation for Uncertainty-Aware Sim-to-Real Manipulation
Maggie Wang, Stephen Tian, Aiden Swann, Ola Shorinwa, Jiajun Wu, Mac Schwager
arxiv.org/abs/2510.11689

@arXiv_csAI_bot@mastoxiv.page
2025-10-10 10:17:59

TaoSR-SHE: Stepwise Hybrid Examination Reinforcement Learning Framework for E-commerce Search Relevance
Pengkun Jiao, Yiming Jin, Jianhui Yang, Chenhe Dong, Zerui Huang, Shaowei Yao, Xiaojiang Zhou, Dan Ou, Haihong Tang
arxiv.org/abs/2510.07972

@arXiv_csLG_bot@mastoxiv.page
2025-10-10 11:02:29

Reinforcement Learning from Probabilistic Forecasts for Safe Decision-Making via Conditional Value-at-Risk Planning
Michal Koren, Or Peretz, Tai Dinh, Philip S. Yu
arxiv.org/abs/2510.08226

@arXiv_csCL_bot@mastoxiv.page
2025-10-02 10:48:11

Research on the Integration of Embodied Intelligence and Reinforcement Learning in Textual Domains
Haonan Wang, Junfeng Sun, Mingjia Zhao, Wei Liu
arxiv.org/abs/2510.01076

@arXiv_csLG_bot@mastoxiv.page
2025-10-10 11:16:59

Convergence Theorems for Entropy-Regularized and Distributional Reinforcement Learning
Yash Jhaveri, Harley Wiltzer, Patrick Shafto, Marc G. Bellemare, David Meger
arxiv.org/abs/2510.08526

@arXiv_csRO_bot@mastoxiv.page
2025-10-15 10:10:51

Reflection-Based Task Adaptation for Self-Improving VLA
Baicheng Li, Dong Wu, Zike Yan, Xinchen Liu, Zecui Zeng, Lusong Li, Hongbin Zha
arxiv.org/abs/2510.12710

@arXiv_csAI_bot@mastoxiv.page
2025-10-10 10:44:59

Agent Learning via Early Experience
Kai Zhang, Xiangchao Chen, Bo Liu, Tianci Xue, Zeyi Liao, Zhihan Liu, Xiyao Wang, Yuting Ning, Zhaorun Chen, Xiaohan Fu, Jian Xie, Yuxuan Sun, Boyu Gou, Qi Qi, Zihang Meng, Jianwei Yang, Ning Zhang, Xian Li, Ashish Shah, Dat Huynh, Hengduo Li, Zi Yang, Sara Cao, Lawrence Jang, Shuyan Zhou, Jiacheng Zhu, Huan Sun, Jason Weston, Yu Su, Yifan Wu

@arXiv_csLG_bot@mastoxiv.page
2025-10-10 11:07:39

DeepEN: Personalized Enteral Nutrition for Critically Ill Patients using Deep Reinforcement Learning
Daniel Jason Tan, Jiayang Chen, Dilruk Perera, Kay Choong See, Mengling Feng
arxiv.org/abs/2510.08350

@arXiv_csCL_bot@mastoxiv.page
2025-10-09 10:27:11

Adaptive Tool Generation with Models as Tools and Reinforcement Learning
Chenpeng Wang, Xiaojie Cheng, Chunye Wang, Linfeng Yang, Lei Zhang
arxiv.org/abs/2510.06825

@arXiv_csRO_bot@mastoxiv.page
2025-10-08 10:13:39

Learning to Crawl: Latent Model-Based Reinforcement Learning for Soft Robotic Adaptive Locomotion
Vaughn Gzenda, Robin Chhabra
arxiv.org/abs/2510.05957

@arXiv_csLG_bot@mastoxiv.page
2025-10-10 11:12:19

ClauseLens: Clause-Grounded, CVaR-Constrained Reinforcement Learning for Trustworthy Reinsurance Pricing
Stella C. Dong, James R. Finlay
arxiv.org/abs/2510.08429

@arXiv_csAI_bot@mastoxiv.page
2025-10-06 08:26:59

A Benchmark Study of Deep Reinforcement Learning Algorithms for the Container Stowage Planning Problem
Yunqi Huang, Nishith Chennakeshava, Alexis Carras, Vladislav Neverov, Wei Liu, Aske Plaat, Yingjie Fan
arxiv.org/abs/2510.02589

@arXiv_csLG_bot@mastoxiv.page
2025-10-06 10:26:39

Q-Learning with Shift-Aware Upper Confidence Bound in Non-Stationary Reinforcement Learning
Ha Manh Bui, Felix Parker, Kimia Ghobadi, Anqi Liu
arxiv.org/abs/2510.03181

@arXiv_csLG_bot@mastoxiv.page
2025-10-13 12:17:31

Crosslisted article(s) found for cs.LG. arxiv.org/list/cs.LG/new
[4/5]:
- Application of Deep Reinforcement Learning to At-the-Money S&P 500 Options Hedging
Zofia Bracha, Pawe{\l} Sakowski, Jakub Micha\'nk\'ow

@arXiv_csAI_bot@mastoxiv.page
2025-10-09 09:58:01

Tool-Augmented Policy Optimization: Synergizing Reasoning and Adaptive Tool Use with Reinforcement Learning
Wenxun Wu, Yuanyang Li, Guhan Chen, Linyue Wang, Hongyang Chen
arxiv.org/abs/2510.07038

@arXiv_csLG_bot@mastoxiv.page
2025-10-13 15:02:10

Replaced article(s) found for cs.LG. arxiv.org/list/cs.LG/new
[4/8]:
- Latent Variable Modeling in Multi-Agent Reinforcement Learning via Expectation-Maximization for U...
Mazyar Taghavi, Rahman Farnoosh

@arXiv_csLG_bot@mastoxiv.page
2025-10-13 15:02:21

Replaced article(s) found for cs.LG. arxiv.org/list/cs.LG/new
[5/8]:
- Spatiotemporal Forecasting as Planning: A Model-Based Reinforcement Learning Approach with Genera...
Wu, Gao, Shi, Li, Xu, Zhang, Zhu, Wang, Luo, Wang, Wu, Huang

@arXiv_csRO_bot@mastoxiv.page
2025-10-13 10:13:20

Guiding Energy-Efficient Locomotion through Impact Mitigation Rewards
Chenghao Wang, Arjun Viswanathan, Eric Sihite, Alireza Ramezani
arxiv.org/abs/2510.09543

@arXiv_csLG_bot@mastoxiv.page
2025-10-13 15:02:00

Replaced article(s) found for cs.LG. arxiv.org/list/cs.LG/new
[3/8]:
- EFRame: Deeper Reasoning via Exploration-Filter-Replay Reinforcement Learning Framework
Chen Wang, Lai Wei, Yanzhi Zhang, Chenyang Shao, Zedong Dan, Weiran Huang, Yuzhi Zhang, Yue Wang

@arXiv_csLG_bot@mastoxiv.page
2025-10-09 10:55:11

Test-Time Graph Search for Goal-Conditioned Reinforcement Learning
Evgenii Opryshko, Junwei Quan, Claas Voelcker, Yilun Du, Igor Gilitschenski
arxiv.org/abs/2510.07257

@arXiv_csRO_bot@mastoxiv.page
2025-10-07 11:26:52

Learning to Capture Rocks using an Excavator: A Reinforcement Learning Approach with Guiding Reward Formulation
Amirmasoud Molaei, Reza Ghabcheloo
arxiv.org/abs/2510.04168

@arXiv_csRO_bot@mastoxiv.page
2025-10-09 10:08:31

HyPlan: Hybrid Learning-Assisted Planning Under Uncertainty for Safe Autonomous Driving
Donald Pfaffmann, Matthias Klusch, Marcel Steinmetz
arxiv.org/abs/2510.07210

@arXiv_csLG_bot@mastoxiv.page
2025-10-02 11:08:41

Multi-Actor Multi-Critic Deep Deterministic Reinforcement Learning with a Novel Q-Ensemble Method
Andy Wu, Chun-Cheng Lin, Rung-Tzuo Liaw, Yuehua Huang, Chihjung Kuo, Chia Tong Weng
arxiv.org/abs/2510.01083

@arXiv_csRO_bot@mastoxiv.page
2025-10-14 12:39:28

Ego-Vision World Model for Humanoid Contact Planning
Hang Liu, Yuman Gao, Sangli Teng, Yufeng Chi, Yakun Sophia Shao, Zhongyu Li, Maani Ghaffari, Koushil Sreenath
arxiv.org/abs/2510.11682

@arXiv_csLG_bot@mastoxiv.page
2025-10-08 10:59:29

Multi-Task Reinforcement Learning with Language-Encoded Gated Policy Networks
Rushiv Arora
arxiv.org/abs/2510.06138 arxiv.org/pdf/2510.0613…

@arXiv_csLG_bot@mastoxiv.page
2025-10-06 10:27:29

To Distill or Decide? Understanding the Algorithmic Trade-off in Partially Observable Reinforcement Learning
Yuda Song, Dhruv Rohatgi, Aarti Singh, J. Andrew Bagnell
arxiv.org/abs/2510.03207

@arXiv_csLG_bot@mastoxiv.page
2025-10-10 11:16:19

DYNAMIX: RL-based Adaptive Batch Size Optimization in Distributed Machine Learning Systems
Yuanjun Dai, Keqiang He, An Wang
arxiv.org/abs/2510.08522

@arXiv_csLG_bot@mastoxiv.page
2025-10-14 13:41:38

Representation-Based Exploration for Language Models: From Test-Time to Post-Training
Jens Tuyls, Dylan J. Foster, Akshay Krishnamurthy, Jordan T. Ash
arxiv.org/abs/2510.11686

@arXiv_csLG_bot@mastoxiv.page
2025-10-08 11:01:09

Stratified GRPO: Handling Structural Heterogeneity in Reinforcement Learning of LLM Search Agents
Mingkang Zhu, Xi Chen, Bei Yu, Hengshuang Zhao, Jiaya Jia
arxiv.org/abs/2510.06214

@arXiv_csLG_bot@mastoxiv.page
2025-10-08 10:57:39

Learning from Failures: Understanding LLM Alignment through Failure-Aware Inverse RL
Nyal Patel, Matthieu Bou, Arjun Jagota, Satyapriya Krishna, Sonali Parbhoo
arxiv.org/abs/2510.06092

@arXiv_csLG_bot@mastoxiv.page
2025-10-14 13:40:48

MATH-Beyond: A Benchmark for RL to Expand Beyond the Base Model
Prasanna Mayilvahanan, Ricardo Dominguez-Olmedo, Thadd\"aus Wiedemer, Wieland Brendel
arxiv.org/abs/2510.11653

@arXiv_csLG_bot@mastoxiv.page
2025-10-14 13:41:28

Boundary-Guided Policy Optimization for Memory-efficient RL of Diffusion Large Language Models
Nianyi Lin, Jiajie Zhang, Lei Hou, Juanzi Li
arxiv.org/abs/2510.11683

@arXiv_csLG_bot@mastoxiv.page
2025-10-06 10:22:49

Distributional Inverse Reinforcement Learning
Feiyang Wu, Ye Zhao, Anqi Wu
arxiv.org/abs/2510.03013 arxiv.org/pdf/2510.03013

@arXiv_csLG_bot@mastoxiv.page
2025-10-14 13:37:38

ReLook: Vision-Grounded RL with a Multimodal LLM Critic for Agentic Web Coding
Yuhang Li, Chenchen Zhang, Ruilin Lv, Ao Liu, Ken Deng, Yuanxing Zhang, Jiaheng Liu, Wiggin Zhou, Bo Zhou
arxiv.org/abs/2510.11498

@arXiv_csLG_bot@mastoxiv.page
2025-10-14 13:43:48

Reinforced sequential Monte Carlo for amortised sampling
Sanghyeok Choi, Sarthak Mittal, V\'ictor Elvira, Jinkyoo Park, Nikolay Malkin
arxiv.org/abs/2510.11711

@arXiv_csLG_bot@mastoxiv.page
2025-10-10 11:12:49

xRouter: Training Cost-Aware LLMs Orchestration System via Reinforcement Learning
Cheng Qian, Zuxin Liu, Shirley Kokane, Akshara Prabhakar, Jielin Qiu, Haolin Chen, Zhiwei Liu, Heng Ji, Weiran Yao, Shelby Heinecke, Silvio Savarese, Caiming Xiong, Huan Wang
arxiv.org/abs/2510.08439

@arXiv_csLG_bot@mastoxiv.page
2025-10-06 10:24:29

A Unified Deep Reinforcement Learning Approach for Close Enough Traveling Salesman Problem
Mingfeng Fan, Jiaqi Cheng, Yaoxin Wu, Yifeng Zhang, Yibin Yang, Guohua Wu, Guillaume Sartoretti
arxiv.org/abs/2510.03065

@arXiv_csLG_bot@mastoxiv.page
2025-10-06 10:27:39

Low-probability Tokens Sustain Exploration in Reinforcement Learning with Verifiable Reward
Guanhua Huang, Tingqiang Xu, Mingze Wang, Qi Yi, Xue Gong, Siheng Li, Ruibin Xiong, Kejiao Li, Yuhao Jiang, Bo Zhou
arxiv.org/abs/2510.03222

@arXiv_csLG_bot@mastoxiv.page
2025-10-03 11:00:11

Reinforcement Learning with Action-Triggered Observations
Alexander Ryabchenko, Wenlong Mou
arxiv.org/abs/2510.02149 arxiv.org/pdf/2510.021…

@arXiv_csLG_bot@mastoxiv.page
2025-10-09 10:57:31

h1: Bootstrapping LLMs to Reason over Longer Horizons via Reinforcement Learning
Sumeet Ramesh Motwani, Alesia Ivanova, Ziyang Cai, Philip Torr, Riashat Islam, Shital Shah, Christian Schroeder de Witt, Charles London
arxiv.org/abs/2510.07312

@arXiv_csLG_bot@mastoxiv.page
2025-10-13 10:43:00

HINT: Helping Ineffective Rollouts Navigate Towards Effectiveness
Xinyi Wang, Jinyi Han, Zishang Jiang, Tingyun Li, Jiaqing Liang, Sihang Jiang, Zhaoqian Dai, Shuguang Ma, Fei Yu, Yanghua Xiao
arxiv.org/abs/2510.09388