Tootfinder

Opt-in global Mastodon full text search. Join the index!

@arXiv_quantph_bot@mastoxiv.page
2025-06-10 18:12:00

This arxiv.org/abs/2204.04198 has been replaced.
link: scholar.google.com/scholar?q=a

@arXiv_csCL_bot@mastoxiv.page
2025-06-10 18:56:30

This arxiv.org/abs/2505.22942 has been replaced.
initial toot: mastoxiv.page/@arXiv_csCL_…

@arXiv_csCV_bot@mastoxiv.page
2025-07-10 08:54:01

Video-RTS: Rethinking Reinforcement Learning and Test-Time Scaling for Efficient and Enhanced Video Reasoning
Ziyang Wang, Jaehong Yoon, Shoubin Yu, Md Mohaiminul Islam, Gedas Bertasius, Mohit Bansal
arxiv.org/abs/2507.06485

@arXiv_csLG_bot@mastoxiv.page
2025-06-12 09:50:41

LPO: Towards Accurate GUI Agent Interaction via Location Preference Optimization
Jiaqi Tang, Yu Xia, Yi-Feng Wu, Yuwei Hu, Yuhui Chen, Qing-Guo Chen, Xiaogang Xu, Xiangyu Wu, Hao Lu, Yanqing Ma, Shiyin Lu, Qifeng Chen
arxiv.org/abs/2506.09373

@arXiv_csSD_bot@mastoxiv.page
2025-08-08 08:39:22

Towards Hallucination-Free Music: A Reinforcement Learning Preference Optimization Framework for Reliable Song Generation
Huaicheng Zhang, Wei Tan, Guangzheng Li, Yixuan Zhang, Hangting Chen, Shun Lei, Chenyu Yang, Zhiyong Wu, Shuai Wang, Qijun Huang, Dong Yu
arxiv.org/abs/2508.05011

@arXiv_csLG_bot@mastoxiv.page
2025-06-09 10:11:32

Table-r1: Self-supervised and Reinforcement Learning for Program-based Table Reasoning in Small Language Models
Rihui Jin, Zheyu Xin, Xing Xie, Zuoyi Li, Guilin Qi, Yongrui Chen, Xinbang Dai, Tongtong Wu, Gholamreza Haffari
arxiv.org/abs/2506.06137

@arXiv_csCV_bot@mastoxiv.page
2025-08-08 10:29:02

Test-Time Reinforcement Learning for GUI Grounding via Region Consistency
Yong Du, Yuchen Yan, Fei Tang, Zhengxi Lu, Chang Zong, Weiming Lu, Shengpei Jiang, Yongliang Shen
arxiv.org/abs/2508.05615

@arXiv_csDC_bot@mastoxiv.page
2025-06-05 07:17:43

Crowd-SFT: Crowdsourcing for LLM Alignment
Alex Sotiropoulos, Sulyab Thottungal Valapu, Linus Lei, Jared Coleman, Bhaskar Krishnamachari
arxiv.org/abs/2506.04063

@arXiv_csCL_bot@mastoxiv.page
2025-07-08 13:50:11

R1-RE: Cross-Domain Relationship Extraction with RLVR
Runpeng Dai, Tong Zheng, Run Yang, Hongtu Zhu
arxiv.org/abs/2507.04642

@arXiv_csAI_bot@mastoxiv.page
2025-08-06 14:35:12

Replaced article(s) found for cs.AI. arxiv.org/list/cs.AI/new
[3/6]:
- Shaping Sparse Rewards in Reinforcement Learning: A Semi-supervised Approach
Wenyun Li, Wenjie Huang, Chen Sun

@arXiv_physicsmedph_bot@mastoxiv.page
2025-08-05 08:23:30

Accelerating multiparametric quantitative MRI using self-supervised scan-specific implicit neural representation with model reinforcement
Ruimin Feng, Albert Jang, Xingxin He, Fang Liu
arxiv.org/abs/2508.00891

@arXiv_csSE_bot@mastoxiv.page
2025-05-30 07:21:33

Afterburner: Reinforcement Learning Facilitates Self-Improving Code Efficiency Optimization
Mingzhe Du, Luu Tuan Tuan, Yue Liu, Yuhao Qing, Dong Huang, Xinyi He, Qian Liu, Zejun Ma, See-kiong Ng
arxiv.org/abs/2505.23387

@arXiv_csCR_bot@mastoxiv.page
2025-06-04 07:26:46

BitBypass: A New Direction in Jailbreaking Aligned Large Language Models with Bitstream Camouflage
Kalyan Nakka, Nitesh Saxena
arxiv.org/abs/2506.02479

@arXiv_csCL_bot@mastoxiv.page
2025-06-03 08:19:46

Reasoning-Table: Exploring Reinforcement Learning for Table Reasoning
Fangyu Lei, Jinxiang Meng, Yiming Huang, Tinghong Chen, Yun Zhang, Shizhu He, Jun Zhao, Kang Liu
arxiv.org/abs/2506.01710

@arXiv_csRO_bot@mastoxiv.page
2025-07-16 09:30:31

Versatile and Generalizable Manipulation via Goal-Conditioned Reinforcement Learning with Grounded Object Detection
Huiyi Wang, Fahim Shahriar, Alireza Azimi, Gautham Vasan, Rupam Mahmood, Colin Bellinger
arxiv.org/abs/2507.10814

@arXiv_csGR_bot@mastoxiv.page
2025-06-02 09:56:59

This arxiv.org/abs/2505.19713 has been replaced.
initial toot: mastoxiv.page/@arXiv_csGR_…

@arXiv_csCV_bot@mastoxiv.page
2025-06-30 10:16:50

MiCo: Multi-image Contrast for Reinforcement Visual Reasoning
Xi Chen, Mingkang Zhu, Shaoteng Liu, Xiaoyang Wu, Xiaogang Xu, Yu Liu, Xiang Bai, Hengshuang Zhao
arxiv.org/abs/2506.22434

@arXiv_csSE_bot@mastoxiv.page
2025-06-03 17:33:48

This arxiv.org/abs/2505.23387 has been replaced.
initial toot: mastoxiv.page/@arXiv_csSE_…

@arXiv_csAI_bot@mastoxiv.page
2025-06-03 07:20:42

Reasoning Like an Economist: Post-Training on Economic Problems Induces Strategic Generalization in LLMs
Yufa Zhou, Shaobo Wang, Xingyu Dong, Xiangqi Jin, Yifang Chen, Yue Min, Kexin Yang, Xingzhang Ren, Dayiheng Liu, Linfeng Zhang
arxiv.org/abs/2506.00577

@arXiv_csCL_bot@mastoxiv.page
2025-08-01 10:19:11

Med-R$^3$: Enhancing Medical Retrieval-Augmented Reasoning of LLMs via Progressive Reinforcement Learning
Keer Lu, Zheng Liang, Youquan Li, Jiejun Tan, Da Pan, Shusen Zhang, Guosheng Dong, Huang Leng
arxiv.org/abs/2507.23541

@arXiv_csLG_bot@mastoxiv.page
2025-07-17 10:13:20

Online Training and Pruning of Deep Reinforcement Learning Networks
Valentin Frank Ingmar Guenter, Athanasios Sideris
arxiv.org/abs/2507.11975

@arXiv_csRO_bot@mastoxiv.page
2025-06-16 07:49:19

Poutine: Vision-Language-Trajectory Pre-Training and Reinforcement Learning Post-Training Enable Robust End-to-End Autonomous Driving
Luke Rowe, Rodrigue de Schaetzen, Roger Girgis, Christopher Pal, Liam Paull
arxiv.org/abs/2506.11234

@arXiv_csSE_bot@mastoxiv.page
2025-07-24 09:42:30

CodeReasoner: Enhancing the Code Reasoning Ability with Reinforcement Learning
Lingxiao Tang, He Ye, Zhongxin Liu, Xiaoxue Ren, Lingfeng Bao
arxiv.org/abs/2507.17548

@arXiv_csMM_bot@mastoxiv.page
2025-06-13 08:03:50

Multimodal Large Language Models: A Survey
Longzhen Han, Awes Mubarak, Almas Baimagambetov, Nikolaos Polatidis, Thar Baker
arxiv.org/abs/2506.10016

@arXiv_csCL_bot@mastoxiv.page
2025-07-03 10:15:30

NaturalThoughts: Selecting and Distilling Reasoning Traces for General Reasoning Tasks
Yang Li, Youssef Emad, Karthik Padthe, Jack Lanchantin, Weizhe Yuan, Thao Nguyen, Jason Weston, Shang-Wen Li, Dong Wang, Ilia Kulikov, Xian Li
arxiv.org/abs/2507.01921

@arXiv_csRO_bot@mastoxiv.page
2025-06-13 08:06:50

Multi-Timescale Dynamics Model Bayesian Optimization for Plasma Stabilization in Tokamaks
Rohit Sonker, Alexandre Capone, Andrew Rothstein, Hiro Josep Farre Kaga, Egemen Kolemen, Jeff Schneider
arxiv.org/abs/2506.10287

@arXiv_csCL_bot@mastoxiv.page
2025-06-30 10:22:00

HyperCLOVA X THINK Technical Report
NAVER Cloud HyperCLOVA X Team
arxiv.org/abs/2506.22403 arxiv.org/pdf/2506.22403…