Tootfinder

Opt-in global Mastodon full text search. Join the index!

@arXiv_csLG_bot@mastoxiv.page
2025-08-25 09:59:30

RL Is Neither a Panacea Nor a Mirage: Understanding Supervised vs. Reinforcement Learning Fine-Tuning for LLMs
Hangzhan Jin, Sicheng Lv, Sifan Wu, Mohammad Hamdaqa
arxiv.org/abs/2508.16546

@arXiv_csCV_bot@mastoxiv.page
2025-08-26 12:32:47

InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
Weiyun Wang, Zhangwei Gao, Lixin Gu, Hengjun Pu, Long Cui, Xingguang Wei, Zhaoyang Liu, Linglin Jing, Shenglong Ye, Jie Shao, Zhaokai Wang, Zhe Chen, Hongjie Zhang, Ganlin Yang, Haomin Wang, Qi Wei, Jinhui Yin, Wenhao Li, Erfei Cui, Guanzhou Chen, Zichen Ding, Changyao Tian, Zhenyu Wu, Jingjing Xie, Zehao Li, Bowen Yang, Yuchen Duan, Xuehui Wang, Songze Li, Xiangyu Zhao, Haodong Duan, Nianche…

@arXiv_csCL_bot@mastoxiv.page
2025-06-26 09:40:40

OctoThinker: Mid-training Incentivizes Reinforcement Learning Scaling
Zengzhi Wang, Fan Zhou, Xuefeng Li, Pengfei Liu
arxiv.org/abs/2506.20512

@arXiv_econGN_bot@mastoxiv.page
2025-07-25 07:55:42

From Individual Learning to Market Equilibrium: Correcting Structural and Parametric Biases in RL Simulations of Economic Models
Zeqiang Zhang, Ruxin Chen
arxiv.org/abs/2507.18229

@Mediagazer@mstdn.social
2025-06-25 20:15:52

Testifying before Congress, Kari Lake said reform at USAGM "was not possible" but the CEOs of RFE/RL, RFA and MBN said she had not met with them even once (Scott Nover/Washington Post)
washingtonpost.com/style/media

@arXiv_csLG_bot@mastoxiv.page
2025-07-24 09:53:09

Reinforcement Learning Fine-Tunes a Sparse Subnetwork in Large Language Models
Andrii Balashov
arxiv.org/abs/2507.17107

@arXiv_csAI_bot@mastoxiv.page
2025-06-24 10:49:20

Graphs Meet AI Agents: Taxonomy, Progress, and Future Opportunities
Yuanchen Bei, Weizhi Zhang, Siwen Wang, Weizhi Chen, Sheng Zhou, Hao Chen, Yong Li, Jiajun Bu, Shirui Pan, Yizhou Yu, Irwin King, Fakhri Karray, Philip S. Yu
arxiv.org/abs/2506.18019

@arXiv_csNI_bot@mastoxiv.page
2025-06-24 10:29:30

RL-Driven Semantic Compression Model Selection and Resource Allocation in Semantic Communication Systems
Xinyi Lin, Peizheng Li, Adnan Aijaz
arxiv.org/abs/2506.18660

@arXiv_quantph_bot@mastoxiv.page
2025-07-25 10:05:02

Hybrid quantum-classical algorithm for near-optimal planning in POMDPs
Gilberto Cunha, Alexandra Ram\^oa, Andr\'e Sequeira, Michael de Oliveira, Lu\'is Barbosa
arxiv.org/abs/2507.18606

@arXiv_csRO_bot@mastoxiv.page
2025-06-26 08:38:10

Hierarchical Reinforcement Learning and Value Optimization for Challenging Quadruped Locomotion
Jeremiah Coholich, Muhammad Ali Murtaza, Seth Hutchinson, Zsolt Kira
arxiv.org/abs/2506.20036

@arXiv_csSE_bot@mastoxiv.page
2025-08-25 08:51:00

Breaking Barriers in Software Testing: The Power of AI-Driven Automation
Saba Naqvi, Mohammad Baqar
arxiv.org/abs/2508.16025 arxiv.org/pdf/…

@arXiv_csDC_bot@mastoxiv.page
2025-07-25 09:21:11

FCPO: Federated Continual Policy Optimization for Real-Time High-Throughput Edge Video Analytics
Lucas Liebe, Thanh-Tung Nguyen, Dongman Lee
arxiv.org/abs/2507.18047

@arXiv_csLG_bot@mastoxiv.page
2025-08-25 09:54:10

Double Check My Desired Return: Transformer with Target Alignment for Offline Reinforcement Learning
Yue Pei, Hongming Zhang, Chao Gao, Martin M\"uller, Mengxiao Zhu, Hao Sheng, Haogang Zhu, Liang Lin
arxiv.org/abs/2508.16420

@arXiv_eessSY_bot@mastoxiv.page
2025-06-25 09:34:10

Partially Observable Residual Reinforcement Learning for PV-Inverter-Based Voltage Control in Distribution Grids
Sarra Bouchkati, Ramil Sabirov, Steffen Kortmann, Andreas Ulbig
arxiv.org/abs/2506.19353

@arXiv_qbioQM_bot@mastoxiv.page
2025-07-25 12:42:04

Replaced article(s) found for q-bio.QM. arxiv.org/list/q-bio.QM/new
[1/1]:
- A PBN-RL-XAI Framework for Discovering a "Hit-and-Run" Therapeutic Strategy in Melanoma
Zhonglin Liu

@arXiv_csAI_bot@mastoxiv.page
2025-07-25 13:34:13

Replaced article(s) found for cs.AI. arxiv.org/list/cs.AI/new
[5/5]:
- Omni-Thinker: Scaling Cross-Domain Generalization in LLMs via Multi-Task RL with Hybrid Rewards
Li, Zhou, Kazemi, Sun, Ghaddar, Alomrani, Ma, Luo, Li, Wen, Hao, Coates, Zhang

@arXiv_csMA_bot@mastoxiv.page
2025-08-25 09:10:40

Integrated Noise and Safety Management in UAM via A Unified Reinforcement Learning Framework
Surya Murthy, Zhenyu Gao, John-Paul Clarke, Ufuk Topcu
arxiv.org/abs/2508.16440

@arXiv_csCV_bot@mastoxiv.page
2025-08-26 12:29:26

Visual-CoG: Stage-Aware Reinforcement Learning with Chain of Guidance for Text-to-Image Generation
Yaqi Li, Peng Chen, Mingyang Han, Bu Pi, Haoxiang Shi, Runzhou Zhao, Yang Yao, Xuan Zhang, Jun Song
arxiv.org/abs/2508.18032

@arXiv_qfinPM_bot@mastoxiv.page
2025-07-25 08:15:21

HARLF: Hierarchical Reinforcement Learning and Lightweight LLM-Driven Sentiment Integration for Financial Portfolio Optimization
Benjamin Coriat, Eric Benhamou
arxiv.org/abs/2507.18560

@arXiv_eessSP_bot@mastoxiv.page
2025-07-24 08:20:49

PPAAS: PVT and Pareto Aware Analog Sizing via Goal-conditioned Reinforcement Learning
Seunggeun Kim, Ziyi Wang, Sungyoung Lee, Youngmin Oh, Hanqing Zhu, Doyun Kim, David Z. Pan
arxiv.org/abs/2507.17003

@arXiv_csNI_bot@mastoxiv.page
2025-08-25 08:41:20

xDiff: Online Diffusion Model for Collaborative Inter-Cell Interference Management in 5G O-RAN
Peihao Yan, Huacheng Zeng, Y. Thomas Hou
arxiv.org/abs/2508.15843

@arXiv_csLG_bot@mastoxiv.page
2025-08-25 09:57:00

On Zero-Shot Reinforcement Learning
Scott Jeen
arxiv.org/abs/2508.16496 arxiv.org/pdf/2508.16496

@arXiv_csCR_bot@mastoxiv.page
2025-08-19 10:22:40

Optimizing Token Choice for Code Watermarking: A RL Approach
Zhimeng Guo, Huaisheng Zhu, Siyuan Xu, Hangfan Zhang, Teng Xiao, Minhao Cheng
arxiv.org/abs/2508.11925

@arXiv_csRO_bot@mastoxiv.page
2025-08-19 09:52:40

Integrating Symbolic RL Planning into a BDI-based Autonomous UAV Framework: System Integration and SIL Validation
Sangwoo Jeon, Juchul Shin, YeonJe Cho, Gyeong-Tae Kim, Seongwoo Kim
arxiv.org/abs/2508.11890

@arXiv_csDC_bot@mastoxiv.page
2025-07-21 07:34:50

DistFlow: A Fully Distributed RL Framework for Scalable and Efficient LLM Post-Training
Zhixin Wang, Tianyi Zhou, Liming Liu, Ao Li, Jiarui Hu, Dian Yang, Jinlong Hou, Siyuan Feng, Yuan Cheng, Yuan Qi
arxiv.org/abs/2507.13833

@arXiv_csAR_bot@mastoxiv.page
2025-06-10 07:17:22

QForce-RL: Quantized FPGA-Optimized Reinforcement Learning Compute Engine
Anushka Jha, Tanushree Dewangan, Mukul Lokhande, Santosh Kumar Vishvakarma
arxiv.org/abs/2506.07046

@arXiv_eessSY_bot@mastoxiv.page
2025-06-17 10:56:09

RL-Guided MPC for Autonomous Greenhouse Control
Salim Msaad, Murray Harraway, Robert D. McAllister
arxiv.org/abs/2506.13278

@arXiv_csLG_bot@mastoxiv.page
2025-07-24 09:36:59

Pragmatic Policy Development via Interpretable Behavior Cloning
Anton Matsson, Yaochen Rao, Heather J. Litman, Fredrik D. Johansson
arxiv.org/abs/2507.17056

@arXiv_csIR_bot@mastoxiv.page
2025-08-11 09:52:09

M2IO-R1: An Efficient RL-Enhanced Reasoning Framework for Multimodal Retrieval Augmented Multimodal Generation
Zhiyou Xiao, Qinhan Yu, Binghui Li, Geng Chen, Chong Chen, Wentao Zhang
arxiv.org/abs/2508.06328

@Mediagazer@mstdn.social
2025-05-31 01:16:05

A US federal judge orders the USAGM to immediately disburse RFE/RL's May funding of ~$12M, following a similar order last month for its April funding (Radio Free Europe/Radio Liberty)
rferl.org/a/rfe-rl-order-lambe

@arXiv_csHC_bot@mastoxiv.page
2025-06-17 10:52:09

Can you see how I learn? Human observers' inferences about Reinforcement Learning agents' learning processes
Bernhard Hilpert, Muhan Hou, Kim Baraka, Joost Broekens
arxiv.org/abs/2506.13583

@arXiv_eessIV_bot@mastoxiv.page
2025-07-08 11:39:20

CLIP-RL: Surgical Scene Segmentation Using Contrastive Language-Vision Pretraining & Reinforcement Learning
Fatmaelzahraa Ali Ahmed, Muhammad Arsalan, Abdulaziz Al-Ali, Khalid Al-Jalham, Shidin Balakrishnan
arxiv.org/abs/2507.04317

@arXiv_quantph_bot@mastoxiv.page
2025-08-21 10:04:20

Reinforcement learning entangling operations on spin qubits
Mohammad Abedi, Markus Schmitt
arxiv.org/abs/2508.14761 arxiv.org/pdf/2508.1476…

@arXiv_csLG_bot@mastoxiv.page
2025-07-24 08:46:30

Reinforcement Learning in hyperbolic space for multi-step reasoning
Tao Xu, Dung-Yang Lee, Momiao Xiong
arxiv.org/abs/2507.16864

@pbloem@sigmoid.social
2025-07-18 09:25:22

Now out in #TMLR:
🍇 GRAPES: Learning to Sample Graphs for Scalable Graph Neural Networks 🍇
There's lots of work on sampling subgraphs for GNNs, but relatively little on making this sampling process _adaptive_. That is, learning to select the data from the graph that is relevant for your task.
We introduce an RL-based and a GFLowNet-based sampler and show that the approach perf…

A diagram of the GRAPES pipeline. It shows a subgraph being sampled in two steps and being fed to a GNN, with a blue line showing the learning signal. The caption reads Figure 1: Overview of GRAPES. First, GRAPES processes a target node (green) by computing node inclusion probabilities on its 1-hop neighbors (shown by node color shade) with a sampling GNN. Given these probabilities, GRAPES samples k nodes. Then, GRAPES repeats this process over nodes in the 2-hop neighborhood. We pass the sampl…
A results table for node classification on heterophilious graphs. Table 2: F1-scores (%) for different sampling methods trained on heterophilous graphs for a batch size of 256, and a sample size of 256 per layer. We report the mean and standard deviation over 10 runs. The best values among the sampling baselines (all except GAS) are in bold, and the second best are underlined. MC stands for multi-class and ML stands for multi-label classification. OOM indicates out of memory.
Performance of samples vs sampling size showing that GRAPES generally performs well across sample sizes, while other samplers often show more variance across sample sizes. The caption reads Figure 4: Comparative analysis of classification accuracy across different sampling sizes for sampling baseline
and GRAPES. We repeated each experiment five times: The shaded regions show the 95% confidence intervals.
A diagrammatic illustration of a graph classification task used in one of the theorems. The caption reads Figure 9: An example of a graph for Theorem 1 with eight nodes. Red edges belong to E1, features xi and labels yi are shown beside every node. For nodes v1 and v2 we show the edge e12 as an example. As shown, the label of each node is the second feature of its neighbor, where a red edge connects them. The edge homophily ratio is h=12/28 = 0.43.
@arXiv_csDB_bot@mastoxiv.page
2025-07-21 07:43:30

LLaPipe: LLM-Guided Reinforcement Learning for Automated Data Preparation Pipeline Construction
Jing Chang, Chang Liu, Jinbin Huang, Rui Mao, Jianbin Qin
arxiv.org/abs/2507.13712

@arXiv_csRO_bot@mastoxiv.page
2025-06-24 11:57:40

Robots and Children that Learn Together : Improving Knowledge Retention by Teaching Peer-Like Interactive Robots
Imene Tarakli, Samuele Vinanzi, Richard Moore, Alessandro Di Nuovo
arxiv.org/abs/2506.18365

@arXiv_csLG_bot@mastoxiv.page
2025-08-21 10:17:30

Compute-Optimal Scaling for Value-Based Deep RL
Preston Fu, Oleh Rybkin, Zhiyuan Zhou, Michal Nauman, Pieter Abbeel, Sergey Levine, Aviral Kumar
arxiv.org/abs/2508.14881

@arXiv_csCL_bot@mastoxiv.page
2025-06-18 09:15:18

Reasoning with Exploration: An Entropy Perspective
Daixuan Cheng, Shaohan Huang, Xuekai Zhu, Bo Dai, Wayne Xin Zhao, Zhenliang Zhang, Furu Wei
arxiv.org/abs/2506.14758

@arXiv_csAI_bot@mastoxiv.page
2025-08-20 09:52:50

Breaking the SFT Plateau: Multimodal Structured Reinforcement Learning for Chart-to-Code Generation
Lei Chen, Xuanle Zhao, Zhixiong Zeng, Jing Huang, Liming Zheng, Yufeng Zhong, Lin Ma
arxiv.org/abs/2508.13587

@arXiv_csSE_bot@mastoxiv.page
2025-07-18 09:11:32

A Survey of Reinforcement Learning for Software Engineering
Dong Wang, Hanmo You, Lingwei Zhu, Kaiwei Lin, Zheng Chen, Chen Yang, Junji Yu, Zan Wang, Junjie Chen
arxiv.org/abs/2507.12483

@arXiv_quantph_bot@mastoxiv.page
2025-07-23 10:27:32

Hybrid Reward-Driven Reinforcement Learning for Efficient Quantum Circuit Synthesis
Sara Giordano, Kornikar Sen, Miguel A. Martin-Delgado
arxiv.org/abs/2507.16641

@arXiv_csLG_bot@mastoxiv.page
2025-08-18 09:48:30

SeamlessFlow: A Trainer Agent Isolation RL Framework Achieving Bubble-Free Pipelines via Tag Scheduling
Jinghui Wang, Shaojie Wang, Yinghan Cui, Xuxing Chen, Chao Wang, Xiaojiang Zhang, Minglei Zhang, Jiarong Zhang, Wenhao Zhuang, Yuchen Cao, Wankang Bao, Haimo Li, Zheng Lin, Huiming Wang, Haoyang Huang, Zongxian Feng, Zizheng Zhan, Ken Deng, Wen Xiang, Huaixi Tang, Kun Wu, Mengtong Li, Mengfei Xie, Junyi Peng, Haotian Zhang, Bin Chen, Bing Yu

@arXiv_qfinPM_bot@mastoxiv.page
2025-08-19 08:45:20

Optimal Portfolio Construction -- A Reinforcement Learning Embedded Bayesian Hierarchical Risk Parity (RL-BHRP) Approach
Shaofeng Kang, Zeying Tian
arxiv.org/abs/2508.11856

@arXiv_csCV_bot@mastoxiv.page
2025-08-14 07:39:22

RL-MoE: An Image-Based Privacy Preserving Approach In Intelligent Transportation System
Abdolazim Rezaei, Mehdi Sookhak, Mahboobeh Haghparast
arxiv.org/abs/2508.09186

@arXiv_csRO_bot@mastoxiv.page
2025-08-21 09:15:39

Efficient Environment Design for Multi-Robot Navigation via Continuous Control
Jahid Chowdhury Choton, John Woods, William Hsu
arxiv.org/abs/2508.14105

@arXiv_csCL_bot@mastoxiv.page
2025-07-18 09:40:32

QuestA: Expanding Reasoning Capacity in LLMs via Question Augmentation
Jiazheng Li, Hong Lu, Kaiyue Wen, Zaiwen Yang, Jiaxuan Gao, Hongzhou Lin, Yi Wu, Jingzhao Zhang
arxiv.org/abs/2507.13266

@arXiv_csHC_bot@mastoxiv.page
2025-07-08 11:55:10

HyperSumm-RL: A Dialogue Summarization Framework for Modeling Leadership Perception in Social Robots
Subasish Das
arxiv.org/abs/2507.04160

@arXiv_eessSY_bot@mastoxiv.page
2025-07-23 09:19:22

A Distributed Actor-Critic Algorithm for Fixed-Time Consensus in Nonlinear Multi-Agent Systems
Aria Delshad, Maryam Babazadeh
arxiv.org/abs/2507.16520

@arXiv_csAI_bot@mastoxiv.page
2025-08-15 09:45:32

Scaling Up without Fading Out: Goal-Aware Sparse GNN for RL-based Generalized Planning
Sangwoo Jeon, Juchul Shin, Gyeong-Tae Kim, YeonJe Cho, Seongwoo Kim
arxiv.org/abs/2508.10747

@arXiv_csLG_bot@mastoxiv.page
2025-08-18 09:43:10

On-Policy RL Meets Off-Policy Experts: Harmonizing Supervised Fine-Tuning and Reinforcement Learning via Dynamic Weighting
Wenhao Zhang, Yuexiang Xie, Yuchang Sun, Yanxi Chen, Guoyin Wang, Yaliang Li, Bolin Ding, Jingren Zhou
arxiv.org/abs/2508.11408

@arXiv_csSE_bot@mastoxiv.page
2025-06-03 07:29:49

CRScore : Reinforcement Learning with Verifiable Tool and AI Feedback for Code Review
Manav Nitin Kapadnis, Atharva Naik, Carolyn Rose
arxiv.org/abs/2506.00296

@arXiv_eessSP_bot@mastoxiv.page
2025-08-21 08:58:59

Deep Reinforcement Learning Based Routing for Heterogeneous Multi-Hop Wireless Networks
Brian Kim, Justin H. Kong, Terrence J. Moore, Fikadu T. Dagefu
arxiv.org/abs/2508.14884

@arXiv_csDB_bot@mastoxiv.page
2025-07-21 07:39:30

CogniQ-H: A Soft Hierarchical Reinforcement Learning Paradigm for Automated Data Preparation
Jing Chang, Chang Liu, Jinbin Huang, Rui Mao, Jianbin Qin
arxiv.org/abs/2507.13710

@arXiv_csRO_bot@mastoxiv.page
2025-08-18 09:07:50

Actor-Critic for Continuous Action Chunks: A Reinforcement Learning Framework for Long-Horizon Robotic Manipulation with Sparse Reward
Jiarui Yang, Bin Zhu, Jingjing Chen, Yu-Gang Jiang
arxiv.org/abs/2508.11143

@arXiv_csCL_bot@mastoxiv.page
2025-08-22 10:15:01

End-to-End Agentic RAG System Training for Traceable Diagnostic Reasoning
Qiaoyu Zheng, Yuze Sun, Chaoyi Wu, Weike Zhao, Pengcheng Qiu, Yongguo Yu, Kun Sun, Yanfeng Wang, Ya Zhang, Weidi Xie
arxiv.org/abs/2508.15746

@arXiv_csAI_bot@mastoxiv.page
2025-08-22 10:06:31

NiceWebRL: a Python library for human subject experiments with reinforcement learning environments
Wilka Carvalho, Vikram Goddla, Ishaan Sinha, Hoon Shin, Kunal Jha
arxiv.org/abs/2508.15693

@arXiv_csLG_bot@mastoxiv.page
2025-08-20 10:17:20

Convergent Reinforcement Learning Algorithms for Stochastic Shortest Path Problem
Soumyajit Guin, Shalabh Bhatnagar
arxiv.org/abs/2508.13963

@arXiv_quantph_bot@mastoxiv.page
2025-07-17 10:02:20

BenchRL-QAS: Benchmarking reinforcement learning algorithms for quantum architecture search
Azhar Ikhtiarudin, Aditi Das, Param Thakkar, Akash Kundu
arxiv.org/abs/2507.12189

@arXiv_csRO_bot@mastoxiv.page
2025-08-20 09:46:40

Toward Deployable Multi-Robot Collaboration via a Symbolically-Guided Decision Transformer
Rathnam Vidushika Rasanji, Jin Wei-Kocsis, Jiansong Zhang, Dongming Gan, Ragu Athinarayanan, Paul Asunda
arxiv.org/abs/2508.13877

@arXiv_csNI_bot@mastoxiv.page
2025-07-22 09:17:00

PRATA: A Framework to Enable Predictive QoS in Vehicular Networks via Artificial Intelligence
Federico Mason, Tommaso Zugno, Matteo Drago, Marco Giordani, Mate Boban, Michele Zorzi
arxiv.org/abs/2507.14211

@arXiv_eessSY_bot@mastoxiv.page
2025-06-19 08:44:37

Make Your AUV Adaptive: An Environment-Aware Reinforcement Learning Framework For Underwater Tasks
Yimian Ding, Jingzehua Xu, Guanwen Xie, Shuai Zhang, Yi Li
arxiv.org/abs/2506.15082

@arXiv_csLG_bot@mastoxiv.page
2025-07-17 10:13:10

Kevin: Multi-Turn RL for Generating CUDA Kernels
Carlo Baronio, Pietro Marsella, Ben Pan, Simon Guo, Silas Alberti
arxiv.org/abs/2507.11948

@arXiv_csCV_bot@mastoxiv.page
2025-08-12 12:45:43

ReconDreamer-RL: Enhancing Reinforcement Learning via Diffusion-based Scene Reconstruction
Chaojun Ni, Guosheng Zhao, Xiaofeng Wang, Zheng Zhu, Wenkang Qin, Xinze Chen, Guanghong Jia, Guan Huang, Wenjun Mei
arxiv.org/abs/2508.08170

@arXiv_csAI_bot@mastoxiv.page
2025-08-22 10:00:51

A Dynamical Systems Framework for Reinforcement Learning Safety and Robustness Verification
Ahmed Nasir, Abdelhafid Zenati
arxiv.org/abs/2508.15588

@arXiv_csRO_bot@mastoxiv.page
2025-06-23 11:54:50

Learning Dexterous Object Handover
Daniel Frau-Alfaro, Julio Casta\~no-Amoros, Santiago Puente, Pablo Gil, Roberto Calandra
arxiv.org/abs/2506.16822

@arXiv_csCL_bot@mastoxiv.page
2025-08-15 10:06:12

ReviewRL: Towards Automated Scientific Review with RL
Sihang Zeng, Kai Tian, Kaiyan Zhang, Yuru wang, Junqi Gao, Runze Liu, Sa Yang, Jingxuan Li, Xinwei Long, Jiaheng Ma, Biqing Qi, Bowen Zhou
arxiv.org/abs/2508.10308

@arXiv_csRO_bot@mastoxiv.page
2025-08-19 11:02:40

Robot Trains Robot: Automatic Real-World Policy Adaptation and Learning for Humanoids
Kaizhe Hu, Haochen Shi, Yao He, Weizhuo Wang, C. Karen Liu, Shuran Song
arxiv.org/abs/2508.12252

@arXiv_csCV_bot@mastoxiv.page
2025-07-11 10:19:01

Scaling RL to Long Videos
Yukang Chen, Wei Huang, Baifeng Shi, Qinghao Hu, Hanrong Ye, Ligeng Zhu, Zhijian Liu, Pavlo Molchanov, Jan Kautz, Xiaojuan Qi, Sifei Liu, Hongxu Yin, Yao Lu, Song Han
arxiv.org/abs/2507.07966

@arXiv_eessSY_bot@mastoxiv.page
2025-07-22 17:23:40

Replaced article(s) found for eess.SY. arxiv.org/list/eess.SY/new
[1/2]:
- Maximum Causal Entropy IRL in Mean-Field Games and GNEP Framework for Forward RL
Berkay Anahtarci, Can Deha Kariksiz, Naci Saldi

@arXiv_csAI_bot@mastoxiv.page
2025-07-16 10:17:21

Illuminating the Three Dogmas of Reinforcement Learning under Evolutionary Light
Mani Hamidi, Terrence W. Deacon
arxiv.org/abs/2507.11482

@arXiv_csRO_bot@mastoxiv.page
2025-08-14 07:45:02

CLF-RL: Control Lyapunov Function Guided Reinforcement Learning
Kejun Li, Zachary Olkin, Yisong Yue, Aaron D. Ames
arxiv.org/abs/2508.09354

@arXiv_csLG_bot@mastoxiv.page
2025-06-10 19:22:39

This arxiv.org/abs/2506.04168 has been replaced.
initial toot: mastoxiv.page/@arXiv_csLG_…

@arXiv_csCV_bot@mastoxiv.page
2025-08-19 12:05:10

Breaking Reward Collapse: Adaptive Reinforcement for Open-ended Medical Reasoning with Enhanced Semantic Discrimination
Yizhou Liu, Jingwei Wei, Zizhi Chen, Minghao Han, Xukun Zhang, Keliang Liu, Lihua Zhang
arxiv.org/abs/2508.12957

@arXiv_csCL_bot@mastoxiv.page
2025-06-12 09:05:02

VerIF: Verification Engineering for Reinforcement Learning in Instruction Following
Hao Peng, Yunjia Qi, Xiaozhi Wang, Bin Xu, Lei Hou, Juanzi Li
arxiv.org/abs/2506.09942

@arXiv_csRO_bot@mastoxiv.page
2025-06-10 17:29:49

This arxiv.org/abs/2506.04147 has been replaced.
initial toot: mastoxiv.page/@arXiv_csRO_…

@arXiv_csCV_bot@mastoxiv.page
2025-08-12 12:47:03

Reinforcement Learning in Vision: A Survey
Weijia Wu, Chen Gao, Joya Chen, Kevin Qinghong Lin, Qingwei Meng, Yiming Zhang, Yuke Qiu, Hong Zhou, Mike Zheng Shou
arxiv.org/abs/2508.08189

@arXiv_csAI_bot@mastoxiv.page
2025-08-20 10:12:10

ComputerRL: Scaling End-to-End Online Reinforcement Learning for Computer Use Agents
Hanyu Lai, Xiao Liu, Yanxiao Zhao, Han Xu, Hanchen Zhang, Bohao Jing, Yanyu Ren, Shuntian Yao, Yuxiao Dong, Jie Tang
arxiv.org/abs/2508.14040

@arXiv_csRO_bot@mastoxiv.page
2025-06-05 07:23:33

SLAC: Simulation-Pretrained Latent Action Space for Whole-Body Real-World RL
Jiaheng Hu, Peter Stone, Roberto Mart\'in-Mart\'in
arxiv.org/abs/2506.04147

@arXiv_csLG_bot@mastoxiv.page
2025-06-12 10:00:21

On a few pitfalls in KL divergence gradient estimation for RL
Yunhao Tang, R\'emi Munos
arxiv.org/abs/2506.09477

@arXiv_csAI_bot@mastoxiv.page
2025-08-19 10:19:50

Wisdom of the Crowd: Reinforcement Learning from Coevolutionary Collective Feedback
Wenzhen Yuan, Shengji Tang, Weihao Lin, Jiacheng Ruan, Ganqu Cui, Bo Zhang, Tao Chen, Ting Liu, Yuzhuo Fu, Peng Ye, Lei Bai
arxiv.org/abs/2508.12338

@arXiv_csLG_bot@mastoxiv.page
2025-07-17 10:13:20

Online Training and Pruning of Deep Reinforcement Learning Networks
Valentin Frank Ingmar Guenter, Athanasios Sideris
arxiv.org/abs/2507.11975

@arXiv_csAI_bot@mastoxiv.page
2025-08-19 16:23:12

Replaced article(s) found for cs.AI. arxiv.org/list/cs.AI/new
[5/8]:
- Crossing the Human-Robot Embodiment Gap with Sim-to-Real RL using One Human Demonstration
Tyler Ga Wei Lum, Olivia Y. Lee, C. Karen Liu, Jeannette Bohg

@arXiv_csLG_bot@mastoxiv.page
2025-08-21 10:15:20

PepThink-R1: LLM for Interpretable Cyclic Peptide Optimization with CoT SFT and Reinforcement Learning
Ruheng Wang, Hang Zhang, Trieu Nguyen, Shasha Feng, Hao-Wei Pang, Xiang Yu, Li Xiao, Peter Zhiping Zhang
arxiv.org/abs/2508.14765

@arXiv_csAI_bot@mastoxiv.page
2025-08-19 11:10:20

OPTIC-ER: A Reinforcement Learning Framework for Real-Time Emergency Response and Equitable Resource Allocation in Underserved African Communities
Mary Tonwe
arxiv.org/abs/2508.12943

@arXiv_csRO_bot@mastoxiv.page
2025-06-13 09:11:30

Eye, Robot: Learning to Look to Act with a BC-RL Perception-Action Loop
Justin Kerr, Kush Hari, Ethan Weber, Chung Min Kim, Brent Yi, Tyler Bonnen, Ken Goldberg, Angjoo Kanazawa
arxiv.org/abs/2506.10968

@arXiv_csLG_bot@mastoxiv.page
2025-06-05 10:59:18

This arxiv.org/abs/2505.24298 has been replaced.
initial toot: mastoxiv.page/@arXiv_csLG_…

@arXiv_csAI_bot@mastoxiv.page
2025-08-19 10:31:40

The Yokai Learning Environment: Tracking Beliefs Over Space and Time
Constantin Ruhdorfer, Matteo Bortoletto, Andreas Bulling
arxiv.org/abs/2508.12480

@arXiv_csLG_bot@mastoxiv.page
2025-08-20 10:16:20

Categorical Policies: Multimodal Policy Learning and Exploration in Continuous Control
SM Mazharul Islam, Manfred Huber
arxiv.org/abs/2508.13922

@arXiv_csLG_bot@mastoxiv.page
2025-08-20 10:13:10

Reinforcement Learning-based Adaptive Path Selection for Programmable Networks
Jos\'e Eduardo Zerna Torres, Marios Avgeris, Chrysa Papagianni, Gergely Pongr\'acz, Istv\'an G\'odor, Paola Grosso
arxiv.org/abs/2508.13806

@arXiv_csLG_bot@mastoxiv.page
2025-07-09 10:13:42

Detecting and Mitigating Reward Hacking in Reinforcement Learning Systems: A Comprehensive Empirical Study
Ibne Farabi Shihab, Sanjeda Akter, Anuj Sharma
arxiv.org/abs/2507.05619

@arXiv_csLG_bot@mastoxiv.page
2025-08-20 10:15:40

Revisiting Diffusion Q-Learning: From Iterative Denoising to One-Step Action Generation
Thanh Nguyen, Chang D. Yoo
arxiv.org/abs/2508.13904

@arXiv_csLG_bot@mastoxiv.page
2025-07-11 10:23:41

EXPO: Stable Reinforcement Learning with Expressive Policies
Perry Dong, Qiyang Li, Dorsa Sadigh, Chelsea Finn
arxiv.org/abs/2507.07986 arxiv.org/pdf/2507.07986 arxiv.org/html/2507.07986
arXiv:2507.07986v1 Announce Type: new
Abstract: We study the problem of training and fine-tuning expressive policies with online reinforcement learning (RL) given an offline dataset. Training expressive policy classes with online RL present a unique challenge of stable value maximization. Unlike simpler Gaussian policies commonly used in online RL, expressive policies like diffusion and flow-matching policies are parameterized by a long denoising chain, which hinders stable gradient propagation from actions to policy parameters when optimizing against some value function. Our key insight is that we can address stable value maximization by avoiding direct optimization over value with the expressive policy and instead construct an on-the-fly RL policy to maximize Q-value. We propose Expressive Policy Optimization (EXPO), a sample-efficient online RL algorithm that utilizes an on-the-fly policy to maximize value with two parameterized policies -- a larger expressive base policy trained with a stable imitation learning objective and a light-weight Gaussian edit policy that edits the actions sampled from the base policy toward a higher value distribution. The on-the-fly policy optimizes the actions from the base policy with the learned edit policy and chooses the value maximizing action from the base and edited actions for both sampling and temporal-difference (TD) backup. Our approach yields up to 2-3x improvement in sample efficiency on average over prior methods both in the setting of fine-tuning a pretrained policy given offline data and in leveraging offline data to train online.
toXiv_bot_toot

@arXiv_csLG_bot@mastoxiv.page
2025-06-05 10:58:34

This arxiv.org/abs/2505.23527 has been replaced.
initial toot: mastoxiv.page/@arXiv_csLG_…

@arXiv_csLG_bot@mastoxiv.page
2025-06-12 10:03:21

MOORL: A Framework for Integrating Offline-Online Reinforcement Learning
Gaurav Chaudhary, Wassim Uddin Mondal, Laxmidhar Behera
arxiv.org/abs/2506.09574

@arXiv_csLG_bot@mastoxiv.page
2025-06-03 22:01:41

This arxiv.org/abs/2505.23527 has been replaced.
initial toot: mastoxiv.page/@arXiv_csLG_…

@arXiv_csLG_bot@mastoxiv.page
2025-07-09 10:17:22

MobileGUI-RL: Advancing Mobile GUI Agent through Reinforcement Learning in Online Environment
Yucheng Shi, Wenhao Yu, Zaitang Li, Yonglin Wang, Hongming Zhang, Ninghao Liu, Haitao Mi, Dong Yu
arxiv.org/abs/2507.05720

@arXiv_csLG_bot@mastoxiv.page
2025-06-10 19:18:05

This arxiv.org/abs/2505.00546 has been replaced.
initial toot: mastoxiv.page/@arXiv_csLG_…

@arXiv_csLG_bot@mastoxiv.page
2025-06-03 08:21:48

Agnostic Reinforcement Learning: Foundations and Algorithms
Gene Li
arxiv.org/abs/2506.01884 arxiv.org/pdf/2506.01884…

@arXiv_csLG_bot@mastoxiv.page
2025-07-11 10:23:31

Reinforcement Learning with Action Chunking
Qiyang Li, Zhiyuan Zhou, Sergey Levine
arxiv.org/abs/2507.07969 arxiv.org/pdf/2507.07969 arxiv.org/html/2507.07969
arXiv:2507.07969v1 Announce Type: new
Abstract: We present Q-chunking, a simple yet effective recipe for improving reinforcement learning (RL) algorithms for long-horizon, sparse-reward tasks. Our recipe is designed for the offline-to-online RL setting, where the goal is to leverage an offline prior dataset to maximize the sample-efficiency of online learning. Effective exploration and sample-efficient learning remain central challenges in this setting, as it is not obvious how the offline data should be utilized to acquire a good exploratory policy. Our key insight is that action chunking, a technique popularized in imitation learning where sequences of future actions are predicted rather than a single action at each timestep, can be applied to temporal difference (TD)-based RL methods to mitigate the exploration challenge. Q-chunking adopts action chunking by directly running RL in a 'chunked' action space, enabling the agent to (1) leverage temporally consistent behaviors from offline data for more effective online exploration and (2) use unbiased $n$-step backups for more stable and efficient TD learning. Our experimental results demonstrate that Q-chunking exhibits strong offline performance and online sample efficiency, outperforming prior best offline-to-online methods on a range of long-horizon, sparse-reward manipulation tasks.
toXiv_bot_toot