
2025-06-10 18:12:00
This https://arxiv.org/abs/2204.04198 has been replaced.
link: https://scholar.google.com/scholar?q=a
This https://arxiv.org/abs/2204.04198 has been replaced.
link: https://scholar.google.com/scholar?q=a
This https://arxiv.org/abs/2505.22942 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csCL_…
Video-RTS: Rethinking Reinforcement Learning and Test-Time Scaling for Efficient and Enhanced Video Reasoning
Ziyang Wang, Jaehong Yoon, Shoubin Yu, Md Mohaiminul Islam, Gedas Bertasius, Mohit Bansal
https://arxiv.org/abs/2507.06485
LPO: Towards Accurate GUI Agent Interaction via Location Preference Optimization
Jiaqi Tang, Yu Xia, Yi-Feng Wu, Yuwei Hu, Yuhui Chen, Qing-Guo Chen, Xiaogang Xu, Xiangyu Wu, Hao Lu, Yanqing Ma, Shiyin Lu, Qifeng Chen
https://arxiv.org/abs/2506.09373
Towards Hallucination-Free Music: A Reinforcement Learning Preference Optimization Framework for Reliable Song Generation
Huaicheng Zhang, Wei Tan, Guangzheng Li, Yixuan Zhang, Hangting Chen, Shun Lei, Chenyu Yang, Zhiyong Wu, Shuai Wang, Qijun Huang, Dong Yu
https://arxiv.org/abs/2508.05011
Table-r1: Self-supervised and Reinforcement Learning for Program-based Table Reasoning in Small Language Models
Rihui Jin, Zheyu Xin, Xing Xie, Zuoyi Li, Guilin Qi, Yongrui Chen, Xinbang Dai, Tongtong Wu, Gholamreza Haffari
https://arxiv.org/abs/2506.06137
Test-Time Reinforcement Learning for GUI Grounding via Region Consistency
Yong Du, Yuchen Yan, Fei Tang, Zhengxi Lu, Chang Zong, Weiming Lu, Shengpei Jiang, Yongliang Shen
https://arxiv.org/abs/2508.05615
Crowd-SFT: Crowdsourcing for LLM Alignment
Alex Sotiropoulos, Sulyab Thottungal Valapu, Linus Lei, Jared Coleman, Bhaskar Krishnamachari
https://arxiv.org/abs/2506.04063
R1-RE: Cross-Domain Relationship Extraction with RLVR
Runpeng Dai, Tong Zheng, Run Yang, Hongtu Zhu
https://arxiv.org/abs/2507.04642 https://
Replaced article(s) found for cs.AI. https://arxiv.org/list/cs.AI/new
[3/6]:
- Shaping Sparse Rewards in Reinforcement Learning: A Semi-supervised Approach
Wenyun Li, Wenjie Huang, Chen Sun
Accelerating multiparametric quantitative MRI using self-supervised scan-specific implicit neural representation with model reinforcement
Ruimin Feng, Albert Jang, Xingxin He, Fang Liu
https://arxiv.org/abs/2508.00891
Afterburner: Reinforcement Learning Facilitates Self-Improving Code Efficiency Optimization
Mingzhe Du, Luu Tuan Tuan, Yue Liu, Yuhao Qing, Dong Huang, Xinyi He, Qian Liu, Zejun Ma, See-kiong Ng
https://arxiv.org/abs/2505.23387
BitBypass: A New Direction in Jailbreaking Aligned Large Language Models with Bitstream Camouflage
Kalyan Nakka, Nitesh Saxena
https://arxiv.org/abs/2506.02479
Reasoning-Table: Exploring Reinforcement Learning for Table Reasoning
Fangyu Lei, Jinxiang Meng, Yiming Huang, Tinghong Chen, Yun Zhang, Shizhu He, Jun Zhao, Kang Liu
https://arxiv.org/abs/2506.01710
Versatile and Generalizable Manipulation via Goal-Conditioned Reinforcement Learning with Grounded Object Detection
Huiyi Wang, Fahim Shahriar, Alireza Azimi, Gautham Vasan, Rupam Mahmood, Colin Bellinger
https://arxiv.org/abs/2507.10814
This https://arxiv.org/abs/2505.19713 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csGR_…
MiCo: Multi-image Contrast for Reinforcement Visual Reasoning
Xi Chen, Mingkang Zhu, Shaoteng Liu, Xiaoyang Wu, Xiaogang Xu, Yu Liu, Xiang Bai, Hengshuang Zhao
https://arxiv.org/abs/2506.22434
This https://arxiv.org/abs/2505.23387 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csSE_…
Reasoning Like an Economist: Post-Training on Economic Problems Induces Strategic Generalization in LLMs
Yufa Zhou, Shaobo Wang, Xingyu Dong, Xiangqi Jin, Yifang Chen, Yue Min, Kexin Yang, Xingzhang Ren, Dayiheng Liu, Linfeng Zhang
https://arxiv.org/abs/2506.00577
Med-R$^3$: Enhancing Medical Retrieval-Augmented Reasoning of LLMs via Progressive Reinforcement Learning
Keer Lu, Zheng Liang, Youquan Li, Jiejun Tan, Da Pan, Shusen Zhang, Guosheng Dong, Huang Leng
https://arxiv.org/abs/2507.23541
Online Training and Pruning of Deep Reinforcement Learning Networks
Valentin Frank Ingmar Guenter, Athanasios Sideris
https://arxiv.org/abs/2507.11975 http…
Poutine: Vision-Language-Trajectory Pre-Training and Reinforcement Learning Post-Training Enable Robust End-to-End Autonomous Driving
Luke Rowe, Rodrigue de Schaetzen, Roger Girgis, Christopher Pal, Liam Paull
https://arxiv.org/abs/2506.11234
CodeReasoner: Enhancing the Code Reasoning Ability with Reinforcement Learning
Lingxiao Tang, He Ye, Zhongxin Liu, Xiaoxue Ren, Lingfeng Bao
https://arxiv.org/abs/2507.17548
Multimodal Large Language Models: A Survey
Longzhen Han, Awes Mubarak, Almas Baimagambetov, Nikolaos Polatidis, Thar Baker
https://arxiv.org/abs/2506.10016
NaturalThoughts: Selecting and Distilling Reasoning Traces for General Reasoning Tasks
Yang Li, Youssef Emad, Karthik Padthe, Jack Lanchantin, Weizhe Yuan, Thao Nguyen, Jason Weston, Shang-Wen Li, Dong Wang, Ilia Kulikov, Xian Li
https://arxiv.org/abs/2507.01921
Multi-Timescale Dynamics Model Bayesian Optimization for Plasma Stabilization in Tokamaks
Rohit Sonker, Alexandre Capone, Andrew Rothstein, Hiro Josep Farre Kaga, Egemen Kolemen, Jeff Schneider
https://arxiv.org/abs/2506.10287
HyperCLOVA X THINK Technical Report
NAVER Cloud HyperCLOVA X Team
https://arxiv.org/abs/2506.22403 https://arxiv.org/pdf/2506.22403…