Reinforcement Learning with Rubric Anchors
Zenan Huang, Yihong Zhuang, Guoshan Lu, Zeyu Qin, Haokai Xu, Tianyu Zhao, Ru Peng, Jiaqi Hu, Zhanming Shen, Xiaomeng Hu, Xijun Gu, Peiyi Tu, Jiaxin Liu, Wenyu Chen, Yuzhuo Fu, Zhiting Fan, Yanmei Gu, Yuanyuan Wang, Zhengkai Yang, Jianguo Li, Junbo Zhao
https://arxiv.org/abs/2508.12790
When someone asks you to do something that's _part of your effin' job_, answer ASAP. Say 'yes' or say 'no' (if you can), but don't let the request sit for days or weeks. The time you spend ignoring the request might cause problems for the human(s) involved.
ProMed: Shapley Information Gain Guided Reinforcement Learning for Proactive Medical LLMs
Hongxin Ding, Baixiang Huang, Yue Fang, Weibin Liao, Xinke Jiang, Zheng Li, Junfeng Zhao, Yasha Wang
https://arxiv.org/abs/2508.13514
Breaking Reward Collapse: Adaptive Reinforcement for Open-ended Medical Reasoning with Enhanced Semantic Discrimination
Yizhou Liu, Jingwei Wei, Zizhi Chen, Minghao Han, Xukun Zhang, Keliang Liu, Lihua Zhang
https://arxiv.org/abs/2508.12957
Okay. This is my current top candidate for the most absurd, cool, and fascinating experiment/article of 2025.
Can you use candles as a clock signal for a CPU? Surprisingly, the answer is yes.
https://cpldcpu.com/2025/08/13/candle-flame-oscillations-as-a-clock/
LM Agents May Fail to Act on Their Own Risk Knowledge
Yuzhi Tang, Tianxiao Li, Elizabeth Li, Chris J. Maddison, Honghua Dong, Yangjun Ruan
https://arxiv.org/abs/2508.13465 https…
Minimum Sum Coloring with Bundles in Trees and Bipartite Graphs
Takehiro Ito, Naonori Kakimura, Naoyuki Kamiyama, Yusuke Kobayashi, Yoshio Okamoto
https://arxiv.org/abs/2509.15080
Robust Online Calibration for UWB-Aided Visual-Inertial Navigation with Bias Correction
Yizhi Zhou, Jie Xu, Jiawei Xia, Zechen Hu, Weizi Li, Xuan Wang
https://arxiv.org/abs/2508.10999
You may've seen @… 's post about mails he gets about CRA compliance from large companies asking him to answer all kinds of questions. I just saw a similar one to the maintainer of another important FOSS library. The kicker: The company uses a version from 2000. (Yes, no typo, 25 years old. I think it has some unfixed vulnerabilities.)
Wisdom of the Crowd: Reinforcement Learning from Coevolutionary Collective Feedback
Wenzhen Yuan, Shengji Tang, Weihao Lin, Jiacheng Ruan, Ganqu Cui, Bo Zhang, Tao Chen, Ting Liu, Yuzhuo Fu, Peng Ye, Lei Bai
https://arxiv.org/abs/2508.12338