2025-10-03 10:53:31
RESTRAIN: From Spurious Votes to Signals -- Self-Driven RL with Self-Penalization
Zhaoning Yu, Will Su, Leitian Tao, Haozhu Wang, Aashu Singh, Hanchao Yu, Jianyu Wang, Hongyang Gao, Weizhe Yuan, Jason Weston, Ping Yu, Jing Xu
https://arxiv.org/abs/2510.02172






































































