The Language of Approval: Identifying the Drivers of Positive Feedback Online
Agam Goyal, Charlotte Lambert, Eshwar Chandrasekharan
https://arxiv.org/abs/2509.10370 https://
Sources: the first-round bid deadline for WBD is November 20; Paramount wants the full company, and Comcast and Netflix are eying movie/TV studios and HBO Max (Wall Street Journal)
https://www.wsj.com/business/media/paramou
Token-Level Policy Optimization: Linking Group-Level Rewards to Token-Level Aggregation via Markov Likelihood
Xingyu Lin, Yilin Wen, En Wang, Du Su, Wenbin Liu, Chenfu Bao, Zhonghou Lv
https://arxiv.org/abs/2510.09369
Reinforced Preference Optimization for Recommendation
Junfei Tan, Yuxin Chen, An Zhang, Junguang Jiang, Bin Liu, Ziru Xu, Han Zhu, Jian Xu, Bo Zheng, Xiang Wang
https://arxiv.org/abs/2510.12211
TikTok Rewards Divisive Political Messaging During the 2025 German Federal Election
Kirill Solovev, Chiara Drolsbach, Emma Demirel, Nicolas Pr\"ollochs
https://arxiv.org/abs/2509.10336
BaNEL: Exploration Posteriors for Generative Modeling Using Only Negative Rewards
Sangyun Lee, Brandon Amos, Giulia Fanti
https://arxiv.org/abs/2510.09596 https://
The US Treasury and IRS issue guidance allowing crypto products to offer staking rewards under a new safe harbor (Sander Lutz/Decrypt)
https://decrypt.co/348044/ethereum-solana-etfs-green-light-staking-us-treasury-irs-guidance
From <Answer> to <Think>: Multidimensional Supervision of Reasoning Process for LLM Optimization
Beining Wang, Weihang Su, Hongtao Tian, Tao Yang, Yujia Zhou, Ting Yao, Qingyao Ai, Yiqun Liu
https://arxiv.org/abs/2510.11457
Guiding Energy-Efficient Locomotion through Impact Mitigation Rewards
Chenghao Wang, Arjun Viswanathan, Eric Sihite, Alireza Ramezani
https://arxiv.org/abs/2510.09543 https://…
Reasoning Pattern Matters: Learning to Reason without Human Rationales
Chaoxu Pang, Yixuan Cao, Ping Luo
https://arxiv.org/abs/2510.12643 https://arxiv.org…