Reinforcement Learning in Vision: A Survey
Weijia Wu, Chen Gao, Joya Chen, Kevin Qinghong Lin, Qingwei Meng, Yiming Zhang, Yuke Qiu, Hong Zhou, Mike Zheng Shou
https://arxiv.org/abs/2508.08189
Generating Query-Relevant Document Summaries via Reinforcement Learning
Nitin Yadav, Changsung Kang, Hongwei Shang, Ming Sun
https://arxiv.org/abs/2508.08404 https://
Klear-CodeTest: Scalable Test Case Generation for Code Reinforcement Learning
Jia Fu, Xinyu Yang, Hongzhi Zhang, Yahui Liu, Jingyuan Zhang, Qi Wang, Fuzheng Zhang, Guorui Zhou
https://arxiv.org/abs/2508.05710
Deep Reinforcement Learning-Based Control Strategy with Direct Gate Control for Buck Converters
Noboru Katayama
https://arxiv.org/abs/2508.07693 https://ar…
Rethinking Verification for LLM Code Generation: From Generation to Testing
Zihan Ma, Taolin Zhang, Maosong Cao, Wenwei Zhang, Minnan Luo, Songyang Zhang, Kai Chen
https://arxiv.org/abs/2507.06920
Tree-OPO: Off-policy Monte Carlo Tree-Guided Advantage Optimization for Multistep Reasoning
Bingning Huang, Tu Nguyen, Matthieu Zimmer
https://arxiv.org/abs/2509.09284 https://
RewardDance: Reward Scaling in Visual Generation
Jie Wu, Yu Gao, Zilyu Ye, Ming Li, Liang Li, Hanzhong Guo, Jie Liu, Zeyue Xue, Xiaoxia Hou, Wei Liu, Yan Zeng, Weilin Huang
https://arxiv.org/abs/2509.08826
Careful Queries, Credible Results: Teaching RAG Models Advanced Web Search Tools with Reinforcement Learning
Yuqin Dai, Shuo Yang, Guoqing Wang, Yong Deng, Zhanwei Zhang, Jun Yin, Pengyu Zeng, Zhenzhe Ying, Changhua Meng, Can Yi, Yuchen Zhou, Weiqiang Wang, Shuai Lu
https://arxiv.org/abs/2508.07956…
CPO: Addressing Reward Ambiguity in Role-playing Dialogue via Comparative Policy Optimization
Xinge Ye, Rui Wang, Yuchuan Wu, Victor Ma, Feiteng Fang, Fei Huang, Yongbin Li
https://arxiv.org/abs/2508.09074
M2IO-R1: An Efficient RL-Enhanced Reasoning Framework for Multimodal Retrieval Augmented Multimodal Generation
Zhiyou Xiao, Qinhan Yu, Binghui Li, Geng Chen, Chong Chen, Wentao Zhang
https://arxiv.org/abs/2508.06328