Tootfinder

Opt-in global Mastodon full text search. Join the index!

@arXiv_csAI_bot@mastoxiv.page
2025-09-01 08:36:33

Think in Games: Learning to Reason in Games via Reinforcement Learning with Large Language Models
Yi Liao, Yu Gu, Yuan Sui, Zining Zhu, Yifan Lu, Guohua Tang, Zhongqian Sun, Wei Yang
arxiv.org/abs/2508.21365

@arXiv_csSE_bot@mastoxiv.page
2025-09-01 09:00:12

Reusable Test Suites for Reinforcement Learning
J{\o}rn Eirik Betten, Quentin Mazouni, Dennis Gross, Pedro Lind, Helge Spieker
arxiv.org/abs/2508.21553

@arXiv_csRO_bot@mastoxiv.page
2025-08-01 08:28:21

Benchmarking Massively Parallelized Multi-Task Reinforcement Learning for Robotics Tasks
Vira Joshi, Zifan Xu, Bo Liu, Peter Stone, Amy Zhang
arxiv.org/abs/2507.23172

@arXiv_csLG_bot@mastoxiv.page
2025-09-01 08:10:33

Beyond Prediction: Reinforcement Learning as the Defining Leap in Healthcare AI
Dilruk Perera, Gousia Habib, Qianyi Xu, Daniel J. Tan, Kai He, Erik Cambria, Mengling Feng
arxiv.org/abs/2508.21101

@arXiv_csCL_bot@mastoxiv.page
2025-09-01 08:55:13

Improving Aviation Safety Analysis: Automated HFACS Classification Using Reinforcement Learning with Group Relative Policy Optimization
Arash Ahmadi, Sarah Sharif, Yaser Banad
arxiv.org/abs/2508.21201

@arXiv_quantph_bot@mastoxiv.page
2025-09-01 09:23:02

Reinforcement Learning for Optimizing Large Qubit Array based Quantum Sensor Circuits
Laxmisha Ashok Attisara, Sathish Kumar
arxiv.org/abs/2508.21253

@arXiv_csNI_bot@mastoxiv.page
2025-07-01 10:03:43

Offline Reinforcement Learning for Mobility Robustness Optimization
Pegah Alizadeh, Anastasios Giovanidis, Pradeepa Ramachandra, Vasileios Koutsoukis, Osama Arouk
arxiv.org/abs/2506.22793

@arXiv_statML_bot@mastoxiv.page
2025-06-02 10:18:17

This arxiv.org/abs/2308.13135 has been replaced.
initial toot: mastoxiv.page/@arXiv_sta…

@arXiv_csIR_bot@mastoxiv.page
2025-09-01 08:32:33

Breaking the Cold-Start Barrier: Reinforcement Learning with Double and Dueling DQNs
Minda Zhao
arxiv.org/abs/2508.21259 arxiv.org/pdf/2508…

@arXiv_csDB_bot@mastoxiv.page
2025-08-01 07:36:10

AutoIndexer: A Reinforcement Learning-Enhanced Index Advisor Towards Scaling Workloads
Taiyi Wang, Eiko Yoneki
arxiv.org/abs/2507.23084 arx…

@arXiv_csMA_bot@mastoxiv.page
2025-08-01 07:35:30

Causal-Inspired Multi-Agent Decision-Making via Graph Reinforcement Learning
Jing Wang, Yan Jin, Fei Ding, Chongfeng Wei
arxiv.org/abs/2507.23080

@arXiv_eessSY_bot@mastoxiv.page
2025-09-01 09:14:13

DynaMark: A Reinforcement Learning Framework for Dynamic Watermarking in Industrial Machine Tool Controllers
Navid Aftabi, Abhishek Hanchate, Satish Bukkapatnam, Dan Li
arxiv.org/abs/2508.21797

@arXiv_mathOC_bot@mastoxiv.page
2025-06-02 07:27:33

Fine-tuning for Data-enabled Predictive Control of Noisy Systems by Reinforcement Learning
Jinbao Wang, Shiliang Zhang, Jun Liu, Xuehui Ma, Haolin Liu
arxiv.org/abs/2505.24572

@arXiv_csSE_bot@mastoxiv.page
2025-09-01 08:21:03

Learning to Generate Unit Test via Adversarial Reinforcement Learning
Dongjun Lee, Changho Hwang, Kimin Lee
arxiv.org/abs/2508.21107 arxiv.…

@arXiv_csDC_bot@mastoxiv.page
2025-07-02 08:15:00

Collaborative Multi-Agent Reinforcement Learning Approach for Elastic Cloud Resource Scaling
Bruce Fang, Danyi Gao
arxiv.org/abs/2507.00550

@arXiv_csRO_bot@mastoxiv.page
2025-07-02 08:30:20

Control-Optimized Deep Reinforcement Learning for Artificially Intelligent Autonomous Systems
Oren Fivel, Matan Rudman, Kobi Cohen
arxiv.org/abs/2507.00268

@arXiv_csLG_bot@mastoxiv.page
2025-09-01 09:47:22

Beyond expected value: geometric mean optimization for long-term policy performance in reinforcement learning
Xinyi Sheng, Dominik Baumann
arxiv.org/abs/2508.21443

@arXiv_csAI_bot@mastoxiv.page
2025-07-01 11:22:43

Self-correcting Reward Shaping via Language Models for Reinforcement Learning Agents in Games
Ant\'onio Afonso, Iolanda Leite, Alessandro Sestini, Florian Fuchs, Konrad Tollmar, Linus Gissl\'en
arxiv.org/abs/2506.23626

@arXiv_eessSP_bot@mastoxiv.page
2025-09-01 08:48:42

Machine Intelligence on the Edge: Interpretable Cardiac Pattern Localisation Using Reinforcement Learning
Haozhe Tian, Qiyu Rao, Nina Moutonnet, Pietro Ferraro, Danilo Mandic
arxiv.org/abs/2508.21652

@arXiv_csCV_bot@mastoxiv.page
2025-06-30 10:16:50

MiCo: Multi-image Contrast for Reinforcement Visual Reasoning
Xi Chen, Mingkang Zhu, Shaoteng Liu, Xiaoyang Wu, Xiaogang Xu, Yu Liu, Xiang Bai, Hengshuang Zhao
arxiv.org/abs/2506.22434

@arXiv_csIR_bot@mastoxiv.page
2025-07-01 08:11:53

Multi-task Offline Reinforcement Learning for Online Advertising in Recommender Systems
Langming Liu, Wanyu Wang, Chi Zhang, Bo Li, Hongzhi Yin, Xuetao Wei, Wenbo Su, Bo Zheng, Xiangyu Zhao
arxiv.org/abs/2506.23090

@arXiv_csCE_bot@mastoxiv.page
2025-07-31 07:42:51

Deep reinforcement learning for efficient exploration of combinatorial structural design spaces
Chloe S. H. Hong, Keith J. Lee, Caitlin T. Mueller
arxiv.org/abs/2507.22804

@arXiv_csRO_bot@mastoxiv.page
2025-07-02 08:42:10

Mechanical Intelligence-Aware Curriculum Reinforcement Learning for Humanoids with Parallel Actuation
Yusuke Tanaka, Alvin Zhu, Quanyou Wang, Dennis Hong
arxiv.org/abs/2507.00273

@arXiv_csLG_bot@mastoxiv.page
2025-07-01 08:48:33

Hierarchical Adversarially-Resilient Multi-Agent Reinforcement Learning for Cyber-Physical Systems Security
Saad Alqithami
arxiv.org/abs/2506.22445

@arXiv_csLO_bot@mastoxiv.page
2025-08-01 07:40:30

Explanations for Unrealizability of Infinite-State Safety Shields
Andoni Rodriguez, Irfansha Shaik, Davide Corsi, Roy Fox, Cesar Sanchez
arxiv.org/abs/2507.23603

@arXiv_csCL_bot@mastoxiv.page
2025-08-01 10:19:11

Med-R$^3$: Enhancing Medical Retrieval-Augmented Reasoning of LLMs via Progressive Reinforcement Learning
Keer Lu, Zheng Liang, Youquan Li, Jiejun Tan, Da Pan, Shusen Zhang, Guosheng Dong, Huang Leng
arxiv.org/abs/2507.23541

@Techmeme@techhub.social
2025-06-28 03:51:27

Sources: Applied Compute, a pre-launch reinforcement learning startup founded by three former OpenAI staffers, raised $20M at a $100M valuation led by Benchmark (Alex Konrad/Upstarts Media)
upstartsmedia.com/p/ex-openai-

@arXiv_csMA_bot@mastoxiv.page
2025-06-02 07:19:47

R3DM: Enabling Role Discovery and Diversity Through Dynamics Models in Multi-agent Reinforcement Learning
Harsh Goel, Mohammad Omama, Behdad Chalaki, Vaishnav Tadiparthi, Ehsan Moradi Pari, Sandeep Chinchali
arxiv.org/abs/2505.24265

@arXiv_csDC_bot@mastoxiv.page
2025-09-01 07:43:02

A Knowledge Distillation-empowered Adaptive Federated Reinforcement Learning Framework for Multi-Domain IoT Applications Scheduling
Zhiyu Wang, Mohammad Goudarzi, Mingming Gong, Rajkumar Buyya
arxiv.org/abs/2508.21328

@arXiv_eessIV_bot@mastoxiv.page
2025-09-01 10:16:52

Crosslisted article(s) found for eess.IV. arxiv.org/list/eess.IV/new
[1/1]:
- QuadKAN: KAN-Enhanced Quadruped Motion Control via End-to-End Reinforcement Learning
Allen Wang, Gavin Tao

@arXiv_csRO_bot@mastoxiv.page
2025-08-01 09:55:41

Scalable Multi-Task Reinforcement Learning for Generalizable Spatial Intelligence in Visuomotor Agents
Shaofei Cai, Zhancun Mu, Haiwen Xia, Bowei Zhang, Anji Liu, Yitao Liang
arxiv.org/abs/2507.23698

@arXiv_csGR_bot@mastoxiv.page
2025-06-02 09:56:59

This arxiv.org/abs/2505.19713 has been replaced.
initial toot: mastoxiv.page/@arXiv_csGR_…

@arXiv_mathOC_bot@mastoxiv.page
2025-06-30 09:05:40

A Reinforcement Learning Framework for Some Singular Stochastic Control Problems
Zongxia Liang, Xiaodong Luo, Xiang Yu
arxiv.org/abs/2506.22203

@arXiv_csNI_bot@mastoxiv.page
2025-07-01 08:11:43

Reliable Transmission of LTP Using Reinforcement Learning-Based Adaptive FEC
Liang Chen, Yu Song, Kanglian Zhao, Juan A. Fraire, Wenfeng Li
arxiv.org/abs/2506.22470

@arXiv_statCO_bot@mastoxiv.page
2025-07-02 08:07:10

Harnessing the Power of Reinforcement Learning for Adaptive MCMC
Congye Wang, Matthew A. Fisher, Heishiro Kanagawa, Wilson Chen, Chris. J. Oates
arxiv.org/abs/2507.00671

@arXiv_statME_bot@mastoxiv.page
2025-07-01 17:22:35

Replaced article(s) found for stat.ME. arxiv.org/list/stat.ME/new
[2/2]:
- Semiparametric Double Reinforcement Learning with Applications to Long-Term Causal Inference
Lars van der Laan, David Hubbard, Allen Tran, Nathan Kallus, Aur\'elien Bibaut

@arXiv_csCR_bot@mastoxiv.page
2025-07-30 10:26:21

Secure Tug-of-War (SecTOW): Iterative Defense-Attack Training with Reinforcement Learning for Multimodal Model Security
Muzhi Dai, Shixuan Liu, Zhiyuan Zhao, Junyu Gao, Hao Sun, Xuelong Li
arxiv.org/abs/2507.22037

@arXiv_csNE_bot@mastoxiv.page
2025-06-02 07:20:02

Proxy Target: Bridging the Gap Between Discrete Spiking Neural Networks and Continuous Control
Zijie Xu, Tong Bu, Zecheng Hao, Jianhao Ding, Zhaofei Yu
arxiv.org/abs/2505.24161

@arXiv_csCL_bot@mastoxiv.page
2025-09-01 09:46:22

Igniting Creative Writing in Small Language Models: LLM-as-a-Judge versus Multi-Agent Refined Rewards
Xiaolong Wei, Bo Lu, Xingyu Zhang, Zhejun Zhao, Dongdong Shen, Long Xia, Dawei Yin
arxiv.org/abs/2508.21476

@arXiv_csRO_bot@mastoxiv.page
2025-07-01 11:46:33

Multi-Timescale Hierarchical Reinforcement Learning for Unified Behavior and Control of Autonomous Driving
Guizhe Jin, Zhuoren Li, Bo Leng, Ran Yu, Lu Xiong
arxiv.org/abs/2506.23771

@arXiv_csLG_bot@mastoxiv.page
2025-09-01 09:52:42

Priors Matter: Addressing Misspecification in Bayesian Deep Q-Learning
Pascal R. van der Vaart, Neil Yorke-Smith, Matthijs T. J. Spaan
arxiv.org/abs/2508.21488

@arXiv_eessSY_bot@mastoxiv.page
2025-07-31 09:11:31

Safe Deployment of Offline Reinforcement Learning via Input Convex Action Correction
Alex Durkin, Jasper Stolte, Matthew Jones, Raghuraman Pitchumani, Bei Li, Christian Michler, Mehmet Mercang\"oz
arxiv.org/abs/2507.22640

@arXiv_csMA_bot@mastoxiv.page
2025-06-02 07:19:33

Distributed Neural Policy Gradient Algorithm for Global Convergence of Networked Multi-Agent Reinforcement Learning
Pengcheng Dai, Yuanqiu Mo, Wenwu Yu, Wei Ren
arxiv.org/abs/2505.24113

@arXiv_csCV_bot@mastoxiv.page
2025-09-01 10:33:40

Crosslisted article(s) found for cs.CV. arxiv.org/list/cs.CV/new
[1/1]:
- QuadKAN: KAN-Enhanced Quadruped Motion Control via End-to-End Reinforcement Learning
Allen Wang, Gavin Tao

@arXiv_csIR_bot@mastoxiv.page
2025-06-30 09:51:20

Reward Balancing Revisited: Enhancing Offline Reinforcement Learning for Recommender Systems
Wenzheng Shu, Yanxiang Zeng, Yongxiang Tang, Teng Sha, Ning Luo, Yanhua Cheng, Xialong Liu, Fan Zhou, Peng Jiang
arxiv.org/abs/2506.22112

@arXiv_csRO_bot@mastoxiv.page
2025-08-01 09:29:51

Learning to Drift with Individual Wheel Drive: Maneuvering Autonomous Vehicle at the Handling Limits
Yihan Zhou, Yiwen Lu, Bo Yang, Jiayun Li, Yilin Mo
arxiv.org/abs/2507.23339

@arXiv_quantph_bot@mastoxiv.page
2025-08-29 09:54:11

Noise-Resilient Quantum Reinforcement Learning
Jing-Ci Yue, Jun-Hong An
arxiv.org/abs/2508.20601 arxiv.org/pdf/2508.20601

@arXiv_csCE_bot@mastoxiv.page
2025-06-30 07:31:29

Laser Scan Path Design for Controlled Microstructure in Additive Manufacturing with Integrated Reduced-Order Phase-Field Modeling and Deep Reinforcement Learning
Augustine Twumasi, Prokash Chandra Roy, Zixun Li, Soumya Shouvik Bhattacharjee, Zhengtao Gan
arxiv.org/abs/2506.21815

@arXiv_statML_bot@mastoxiv.page
2025-06-30 13:06:59

Replaced article(s) found for stat.ML. arxiv.org/list/stat.ML/new
[1/2]:
- Statistical Inference of the Value Function for Reinforcement Learning in Infinite Horizon Settings
C. Shi, S. Zhang, W. Lu, R. Song

@arXiv_csAI_bot@mastoxiv.page
2025-08-29 09:51:41

Single Agent Robust Deep Reinforcement Learning for Bus Fleet Control
Yifan Zhang
arxiv.org/abs/2508.20784 arxiv.org/pdf/2508.20784

@arXiv_eessSY_bot@mastoxiv.page
2025-07-31 08:46:21

Toward Trusted Onboard AI: Advancing Small Satellite Operations using Reinforcement Learning
Cannon Whitney, Joseph Melville
arxiv.org/abs/2507.22198

@arXiv_csLG_bot@mastoxiv.page
2025-07-02 14:33:20

Replaced article(s) found for cs.LG. arxiv.org/list/cs.LG/new
[2/5]:
- Towards Large-Scale In-Context Reinforcement Learning by Meta-Training in Randomized Worlds
Wang, Shao, Zhang, Yu, Liu, Ding, Cao, Kang, Wang

@arXiv_csRO_bot@mastoxiv.page
2025-09-01 09:17:32

Learning to Assemble the Soma Cube with Legal-Action Masked DQN and Safe ZYZ Regrasp on a Doosan M0609
Jaehong Oh, Seungjun Jung, Sawoong Kim
arxiv.org/abs/2508.21272

@arXiv_csLG_bot@mastoxiv.page
2025-07-31 09:18:31

Spatial-Temporal Reinforcement Learning for Network Routing with Non-Markovian Traffic
Molly Wang
arxiv.org/abs/2507.22174 arxiv.org/pdf/25…

@arXiv_csCV_bot@mastoxiv.page
2025-08-29 10:31:11

Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable Text-to-Image Reinforcement Learning
Yibin Wang, Zhimin Li, Yuhang Zang, Yujie Zhou, Jiazi Bu, Chunyu Wang, Qinglin Lu, Cheng Jin, Jiaqi Wang
arxiv.org/abs/2508.20751

@arXiv_csCL_bot@mastoxiv.page
2025-07-31 09:55:21

Reducing Hallucinations in Summarization via Reinforcement Learning with Entity Hallucination Index
Praveenkumar Katwe, Rakesh Chandra, Balabantaray Kali, Prasad Vittala
arxiv.org/abs/2507.22744

@arXiv_csRO_bot@mastoxiv.page
2025-06-02 07:21:36

Reactive Aerobatic Flight via Reinforcement Learning
Zhichao Han, Xijie Huang, Zhuxiu Xu, Jiarui Zhang, Yuze Wu, Mingyang Wang, Tianyue Wu, Fei Gao
arxiv.org/abs/2505.24396

@arXiv_csAI_bot@mastoxiv.page
2025-09-01 12:06:05

Replaced article(s) found for cs.AI. arxiv.org/list/cs.AI/new
[1/4]:
- Policy Expansion for Bridging Offline-to-Online Reinforcement Learning
Haichao Zhang, We Xu, Haonan Yu

@arXiv_csCL_bot@mastoxiv.page
2025-07-31 09:53:11

From Sufficiency to Reflection: Reinforcement-Guided Thinking Quality in Retrieval-Augmented Reasoning for LLMs
Jie He, Victor Gutierrez Basulto, Jeff Z. Pan
arxiv.org/abs/2507.22716

@arXiv_csRO_bot@mastoxiv.page
2025-09-01 09:32:42

Learning Agile Gate Traversal via Analytical Optimal Policy Gradient
Tianchen Sun, Bingheng Wang, Longbin Tang, Yichao Gao, Lin Zhao
arxiv.org/abs/2508.21592

@arXiv_statML_bot@mastoxiv.page
2025-06-27 09:13:19

Homogenization of Multi-agent Learning Dynamics in Finite-state Markov Games
Yann Kerzreho (ENS Paris Saclay)
arxiv.org/abs/2506.21079

@arXiv_csCV_bot@mastoxiv.page
2025-07-30 10:42:21

X-Omni: Reinforcement Learning Makes Discrete Autoregressive Image Generative Models Great Again
Zigang Geng, Yibing Wang, Yeyao Ma, Chen Li, Yongming Rao, Shuyang Gu, Zhao Zhong, Qinglin Lu, Han Hu, Xiaosong Zhang, Linus, Di Wang, Jie Jiang
arxiv.org/abs/2507.22058

@arXiv_csAI_bot@mastoxiv.page
2025-09-01 10:54:49

Crosslisted article(s) found for cs.AI. arxiv.org/list/cs.AI/new
[1/5]:
- QuadKAN: KAN-Enhanced Quadruped Motion Control via End-to-End Reinforcement Learning
Allen Wang, Gavin Tao

@arXiv_quantph_bot@mastoxiv.page
2025-06-27 10:10:39

Reinforcement Learning for Optimal Control of Spin Magnetometers
Logan W. Cooke, Stefanie Czischek
arxiv.org/abs/2506.21475

@arXiv_csSE_bot@mastoxiv.page
2025-06-27 08:07:49

Complex Model Transformations by Reinforcement Learning with Uncertain Human Guidance
Kyanna Dagenais, Istvan David
arxiv.org/abs/2506.20883

@arXiv_csRO_bot@mastoxiv.page
2025-06-02 10:27:15

This arxiv.org/abs/2505.20751 has been replaced.
initial toot: mastoxiv.page/@arXiv_csRO_…

@arXiv_csLG_bot@mastoxiv.page
2025-09-01 12:11:05

Replaced article(s) found for cs.LG. arxiv.org/list/cs.LG/new
[2/4]:
- BiTrajDiff: Bidirectional Trajectory Generation with Diffusion Models for Offline Reinforcement L...
Yunpeng Qing, Shuo Chen, Yixiao Chi, Shunyu Liu, Sixu Lin, Kelu Yao, Changqing Zou

@arXiv_csCL_bot@mastoxiv.page
2025-07-02 10:15:40

SAFER: Probing Safety in Reward Models with Sparse Autoencoder
Sihang Li, Wei Shi, Ziyuan Xie, Tao Liang, Guojun Ma, Xiang Wang
arxiv.org/abs/2507.00665

@arXiv_csAI_bot@mastoxiv.page
2025-07-02 14:25:35

Replaced article(s) found for cs.AI. arxiv.org/list/cs.AI/new
[3/5]:
- Towards Large-Scale In-Context Reinforcement Learning by Meta-Training in Randomized Worlds
Wang, Shao, Zhang, Yu, Liu, Ding, Cao, Kang, Wang

@arXiv_eessSY_bot@mastoxiv.page
2025-09-01 10:37:02

Crosslisted article(s) found for eess.SY. arxiv.org/list/eess.SY/new
[1/1]:
- QuadKAN: KAN-Enhanced Quadruped Motion Control via End-to-End Reinforcement Learning
Allen Wang, Gavin Tao

@arXiv_csCV_bot@mastoxiv.page
2025-07-31 13:39:21

Replaced article(s) found for cs.CV. arxiv.org/list/cs.CV/new
[4/5]:
- Learning Only with Images: Visual Reinforcement Learning with Reasoning, Rendering, and Visual Fe...
Chen, Shen, Huang, Zhou, Lin, Cai, Yu, Bu, Shi, Qiao

@arXiv_csCL_bot@mastoxiv.page
2025-07-30 10:28:01

Post-Training Large Language Models via Reinforcement Learning from Self-Feedback
Carel van Niekerk, Renato Vukovic, Benjamin Matthias Ruppik, Hsien-chin Lin, Milica Ga\v{s}i\'c
arxiv.org/abs/2507.21931

@arXiv_csMA_bot@mastoxiv.page
2025-07-30 07:47:31

Replicating the behaviour of electric vehicle drivers using an agent-based reinforcement learning model
Zixin Feng, Qunshan Zhao, Alison Heppenstall
arxiv.org/abs/2507.21341

@arXiv_csLG_bot@mastoxiv.page
2025-09-01 09:53:42

Spiking Decision Transformers: Local Plasticity, Phase-Coding, and Dendritic Routing for Low-Power Sequence Control
Vishal Pandey, Debasmita Biswas
arxiv.org/abs/2508.21505

@arXiv_csCL_bot@mastoxiv.page
2025-07-30 10:25:31

AutoTIR: Autonomous Tools Integrated Reasoning via Reinforcement Learning
Yifan Wei, Xiaoyan Yu, Yixuan Weng, Tengfei Pan, Angsheng Li, Li Du
arxiv.org/abs/2507.21836

@arXiv_csCV_bot@mastoxiv.page
2025-07-30 10:41:41

From Seeing to Experiencing: Scaling Navigation Foundation Models with Reinforcement Learning
Honglin He, Yukai Ma, Wayne Wu, Bolei Zhou
arxiv.org/abs/2507.22028

@arXiv_eessSY_bot@mastoxiv.page
2025-07-01 08:14:43

QoS-aware State-Augmented Learnable Algorithm for Wireless Coexistence Parameter Management
Mohammad Reza Fasihi, Brian L. Mark
arxiv.org/abs/2506.22652

@arXiv_csAI_bot@mastoxiv.page
2025-08-29 07:56:40

AI-SearchPlanner: Modular Agentic Search via Pareto-Optimal Multi-Objective Reinforcement Learning
Lang Mei, Zhihan Yang, Chong Chen
arxiv.org/abs/2508.20368

@arXiv_csCL_bot@mastoxiv.page
2025-07-02 09:19:09

Linearly Decoding Refused Knowledge in Aligned Language Models
Aryan Shrivastava, Ari Holtzman
arxiv.org/abs/2507.00239

@arXiv_csRO_bot@mastoxiv.page
2025-06-30 09:18:20

Experimental investigation of pose informed reinforcement learning for skid-steered visual navigation
Ameya Salvi, Venkat Krovi
arxiv.org/abs/2506.21732

@arXiv_eessSY_bot@mastoxiv.page
2025-09-01 08:33:52

Cooperative Sensing Enhanced UAV Path-Following and Obstacle Avoidance with Variable Formation
Changheng Wang, Zhiqing Wei, Wangjun Jiang, Haoyue Jiang, Zhiyong Feng
arxiv.org/abs/2508.21316

@arXiv_csCL_bot@mastoxiv.page
2025-07-30 10:25:41

Graph-R1: Towards Agentic GraphRAG Framework via End-to-end Reinforcement Learning
Haoran Luo, Haihong E, Guanting Chen, Qika Lin, Yikai Guo, Fangzhi Xu, Zemin Kuang, Meina Song, Xiaobao Wu, Yifan Zhu, Luu Anh Tuan
arxiv.org/abs/2507.21892

@arXiv_eessSY_bot@mastoxiv.page
2025-07-01 09:55:53

Real-Time Energy Management Strategies for Community Microgrids
Moslem Uddin, Huadong Mo, Daoyi Dong
arxiv.org/abs/2506.22931

@arXiv_csAI_bot@mastoxiv.page
2025-08-28 09:32:41

SWIRL: A Staged Workflow for Interleaved Reinforcement Learning in Mobile GUI Control
Quanfeng Lu, Zhantao Ma, Shuai Zhong, Jin Wang, Dahai Yu, Michael K. Ng, Ping Luo
arxiv.org/abs/2508.20018

@arXiv_csRO_bot@mastoxiv.page
2025-07-30 09:35:42

Model Predictive Adversarial Imitation Learning for Planning from Observation
Tyler Han, Yanda Bao, Bhaumik Mehta, Gabriel Guo, Anubhav Vishwakarma, Emily Kang, Sanghun Jung, Rosario Scalise, Jason Zhou, Bryan Xu, Byron Boots
arxiv.org/abs/2507.21533

@arXiv_csRO_bot@mastoxiv.page
2025-09-01 11:41:21

Replaced article(s) found for cs.RO. arxiv.org/list/cs.RO/new
[2/2]:
- Merging and Disentangling Views in Visual Reinforcement Learning for Robotic Manipulation
Abdulaziz Almuzairee, Rohan Patil, Dwait Bhatt, Henrik I. Christensen

@arXiv_csLG_bot@mastoxiv.page
2025-08-25 09:54:10

Double Check My Desired Return: Transformer with Target Alignment for Offline Reinforcement Learning
Yue Pei, Hongming Zhang, Chao Gao, Martin M\"uller, Mengxiao Zhu, Hao Sheng, Haogang Zhu, Liang Lin
arxiv.org/abs/2508.16420

@arXiv_csCL_bot@mastoxiv.page
2025-07-28 09:58:01

GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning
Lakshya A Agrawal, Shangyin Tan, Dilara Soylu, Noah Ziems, Rishi Khare, Krista Opsahl-Ong, Arnav Singhvi, Herumb Shandilya, Michael J Ryan, Meng Jiang, Christopher Potts, Koushik Sen, Alexandros G. Dimakis, Ion Stoica, Dan Klein, Matei Zaharia, Omar Khattab
arx…

@arXiv_eessSY_bot@mastoxiv.page
2025-08-29 08:19:21

A Hierarchical Signal Coordination and Control System Using a Hybrid Model-based and Reinforcement Learning Approach
Xianyue Peng, Shenyang Chen, H. Michael Zhang
arxiv.org/abs/2508.20102

@arXiv_csCL_bot@mastoxiv.page
2025-07-30 10:18:01

Libra: Assessing and Improving Reward Model by Learning to Think
Meng Zhou, Bei Li, Jiahao Liu, Xiaowen Shi, Yang Bai, Rongxiang Weng, Jingang Wang, Xunliang Cai
arxiv.org/abs/2507.21645

@arXiv_csAI_bot@mastoxiv.page
2025-07-31 09:17:21

Enhancing Multi-Agent Collaboration with Attention-Based Actor-Critic Policies
Hugo Garrido-Lestache, Jeremy Kedziora
arxiv.org/abs/2507.22782

@arXiv_csLG_bot@mastoxiv.page
2025-08-27 10:35:43

Active Query Selection for Crowd-Based Reinforcement Learning
Jonathan Erskine, Taku Yamagata, Ra\'ul Santos-Rodr\'iguez
arxiv.org/abs/2508.19132

@arXiv_csCL_bot@mastoxiv.page
2025-06-26 09:40:40

OctoThinker: Mid-training Incentivizes Reinforcement Learning Scaling
Zengzhi Wang, Fan Zhou, Xuefeng Li, Pengfei Liu
arxiv.org/abs/2506.20512

@arXiv_csLG_bot@mastoxiv.page
2025-08-27 10:28:13

DRMD: Deep Reinforcement Learning for Malware Detection under Concept Drift
Shae McFadden, Myles Foley, Mario D'Onghia, Chris Hicks, Vasilios Mavroudis, Nicola Paoletti, Fabio Pierazzi
arxiv.org/abs/2508.18839

@arXiv_csAI_bot@mastoxiv.page
2025-07-31 08:35:41

On the Definition of Intelligence
Kei-Sing Ng
arxiv.org/abs/2507.22423 arxiv.org/pdf/2507.22423

@arXiv_csRO_bot@mastoxiv.page
2025-08-29 08:57:11

Task Allocation for Autonomous Machines using Computational Intelligence and Deep Reinforcement Learning
Thanh Thi Nguyen, Quoc Viet Hung Nguyen, Jonathan Kua, Imran Razzak, Dung Nguyen, Saeid Nahavandi
arxiv.org/abs/2508.20688

@arXiv_csRO_bot@mastoxiv.page
2025-07-29 11:10:11

Bipedalism for Quadrupedal Robots: Versatile Loco-Manipulation through Risk-Adaptive Reinforcement Learning
Yuyou Zhang, Radu Corcodel, Ding Zhao
arxiv.org/abs/2507.20382

@arXiv_csCL_bot@mastoxiv.page
2025-08-28 10:01:31

Memory-R1: Enhancing Large Language Model Agents to Manage and Utilize Memories via Reinforcement Learning
Sikuan Yan, Xiufeng Yang, Zuchao Huang, Ercong Nie, Zifeng Ding, Zonggen Li, Xiaowen Ma, Hinrich Sch\"utze, Volker Tresp, Yunpu Ma
arxiv.org/abs/2508.19828

@arXiv_csRO_bot@mastoxiv.page
2025-08-28 08:57:51

Impedance Primitive-augmented Hierarchical Reinforcement Learning for Sequential Tasks
Amin Berjaoui Tahmaz, Ravi Prakash, Jens Kober
arxiv.org/abs/2508.19607

@arXiv_csCL_bot@mastoxiv.page
2025-06-26 09:39:40

ReCode: Updating Code API Knowledge with Reinforcement Learning
Haoze Wu, Yunzhi Yao, Wenhao Yu, Huajun Chen, Ningyu Zhang
arxiv.org/abs/2506.20495