Tootfinder

Opt-in global Mastodon full text search. Join the index!

No exact results. Similar results found.
@arXiv_csCV_bot@mastoxiv.page
2025-08-12 12:47:03

Reinforcement Learning in Vision: A Survey
Weijia Wu, Chen Gao, Joya Chen, Kevin Qinghong Lin, Qingwei Meng, Yiming Zhang, Yuke Qiu, Hong Zhou, Mike Zheng Shou
arxiv.org/abs/2508.08189

@arXiv_csIR_bot@mastoxiv.page
2025-08-13 07:49:32

Generating Query-Relevant Document Summaries via Reinforcement Learning
Nitin Yadav, Changsung Kang, Hongwei Shang, Ming Sun
arxiv.org/abs/2508.08404

@arXiv_csSE_bot@mastoxiv.page
2025-08-11 07:42:19

Klear-CodeTest: Scalable Test Case Generation for Code Reinforcement Learning
Jia Fu, Xinyu Yang, Hongzhi Zhang, Yahui Liu, Jingyuan Zhang, Qi Wang, Fuzheng Zhang, Guorui Zhou
arxiv.org/abs/2508.05710

@arXiv_eessSY_bot@mastoxiv.page
2025-08-12 10:59:33

Deep Reinforcement Learning-Based Control Strategy with Direct Gate Control for Buck Converters
Noboru Katayama
arxiv.org/abs/2508.07693 ar…

@arXiv_csCL_bot@mastoxiv.page
2025-07-10 10:05:21

Rethinking Verification for LLM Code Generation: From Generation to Testing
Zihan Ma, Taolin Zhang, Maosong Cao, Wenwei Zhang, Minnan Luo, Songyang Zhang, Kai Chen
arxiv.org/abs/2507.06920

@arXiv_csAI_bot@mastoxiv.page
2025-09-12 09:11:19

Tree-OPO: Off-policy Monte Carlo Tree-Guided Advantage Optimization for Multistep Reasoning
Bingning Huang, Tu Nguyen, Matthieu Zimmer
arxiv.org/abs/2509.09284

@arXiv_csCV_bot@mastoxiv.page
2025-09-11 10:11:43

RewardDance: Reward Scaling in Visual Generation
Jie Wu, Yu Gao, Zilyu Ye, Ming Li, Liang Li, Hanzhong Guo, Jie Liu, Zeyue Xue, Xiaoxia Hou, Wei Liu, Yan Zeng, Weilin Huang
arxiv.org/abs/2509.08826

@arXiv_csIR_bot@mastoxiv.page
2025-08-12 10:13:13

Careful Queries, Credible Results: Teaching RAG Models Advanced Web Search Tools with Reinforcement Learning
Yuqin Dai, Shuo Yang, Guoqing Wang, Yong Deng, Zhanwei Zhang, Jun Yin, Pengyu Zeng, Zhenzhe Ying, Changhua Meng, Can Yi, Yuchen Zhou, Weiqiang Wang, Shuai Lu
arxiv.org/abs/2508.07956

@arXiv_csCL_bot@mastoxiv.page
2025-08-13 10:16:42

CPO: Addressing Reward Ambiguity in Role-playing Dialogue via Comparative Policy Optimization
Xinge Ye, Rui Wang, Yuchuan Wu, Victor Ma, Feiteng Fang, Fei Huang, Yongbin Li
arxiv.org/abs/2508.09074

@arXiv_csIR_bot@mastoxiv.page
2025-08-11 09:52:09

M2IO-R1: An Efficient RL-Enhanced Reasoning Framework for Multimodal Retrieval Augmented Multimodal Generation
Zhiyou Xiao, Qinhan Yu, Binghui Li, Geng Chen, Chong Chen, Wentao Zhang
arxiv.org/abs/2508.06328