Hierarchical Reinforcement Learning Framework for Adaptive Walking Control Using General Value Functions of Lower-Limb Sensor Signals
Sonny T. Jones, Grange M. Simpson, Patrick M. Pilarski, Ashley N. Dalrymple
https://arxiv.org/abs/2507.16983
OctoThinker: Mid-training Incentivizes Reinforcement Learning Scaling
Zengzhi Wang, Fan Zhou, Xuefeng Li, Pengfei Liu
https://arxiv.org/abs/2506.20512 http…
A Principled Path to Fitted Distributional Evaluation
Sungee Hong, Jiayi Wang, Zhengling Qi, Raymond Ka Wai Wong
https://arxiv.org/abs/2506.20048 https://
Study of $p_\mathrm{T}$-differential radial flow in blast-wave model
Swati Saha, Ranbir Singh, Bedangadas Mohanty
https://arxiv.org/abs/2505.19697 https://…
On the One-Loop Exactness of Gravity Partition Function
Andres Goya, Mauricio Leston, Mario Passaglia
https://arxiv.org/abs/2507.16141 https://
Another of my forays into AI ethics is just out! This time the focus is on the ethics (or lack thereof) of Reinforcement Learning Feedback (RLF) techniques aimed at increasing the 'alignment' of LLMs.
The paper is fruit of the joint work of a great team of collaborators, among whom @… and @…
Robots and Children that Learn Together : Improving Knowledge Retention by Teaching Peer-Like Interactive Robots
Imene Tarakli, Samuele Vinanzi, Richard Moore, Alessandro Di Nuovo
https://arxiv.org/abs/2506.18365
Towards Microgrid Resilience Enhancement via Mobile Power Sources and Repair Crews: A Multi-Agent Reinforcement Learning Approach
Yi Wang, Dawei Qiu, Fei Teng, Goran Strbac
https://arxiv.org/abs/2507.18095
Checklists Are Better Than Reward Models For Aligning Language Models
Vijay Viswanathan, Yanchao Sun, Shuang Ma, Xiang Kong, Meng Cao, Graham Neubig, Tongshuang Wu
https://arxiv.org/abs/2507.18624
AALC: Large Language Model Efficient Reasoning via Adaptive Accuracy-Length Control
Ruosen Li, Ziming Luo, Quan Zhang, Ruochen Li, Ben Zhou, Ali Payani, Xinya Du
https://arxiv.org/abs/2506.20160