Advantage Weighted Matching: Aligning RL with Pretraining in Diffusion Models
Shuchen Xue, Chongjian Ge, Shilong Zhang, Yichen Li, Zhi-Ming Ma
https://arxiv.org/abs/2509.25050 h…
MECap-R1: Emotion-aware Policy with Reinforcement Learning for Multimodal Emotion Captioning
Haoqin Sun, Chenyang Lyu, Xiangyu Kong, Shiwan Zhao, Jiaming Zhou, Hui Wang, Aobo Kong, Jinghua Zhao, Longyue Wang, Weihua Luo, Kaifu Zhang, Yong Qin
https://arxiv.org/abs/2509.18729
Making Qwen3 Think in Korean with Reinforcement Learning
Jungyup Lee, Jemin Kim, Sang Park, SeungJae Lee
https://arxiv.org/abs/2508.10355 https://arxiv.org…