Another of my forays into AI ethics is just out! This time the focus is on the ethics (or lack thereof) of Reinforcement Learning Feedback (RLF) techniques aimed at increasing the 'alignment' of LLMs.
The paper is fruit of the joint work of a great team of collaborators, among whom @… and @…
Good article how reinforcement learning improved current AI models. Also illustrates that LLMs today are not just imitating.
https://arstechnica.com/ai/2025/07/how
Vier Raumfahrer zur ISS gestartet
Eine Kapsel des Raumfahrtunternehmens SpaceX bringt eine vierköpfige Crew ins All. Das Wetter sorgte zunächst für Verzögerung.
https://www.heise.de/news/Vier-Raumfahrer-
Agent Lightning: Train ANY AI Agents with Reinforcement Learning
Xufang Luo, Yuge Zhang, Zhiyuan He, Zilong Wang, Siyun Zhao, Dongsheng Li, Luna K. Qiu, Yuqing Yang
https://arxiv.org/abs/2508.03680
Regret-Optimal Q-Learning with Low Cost for Single-Agent and Federated Reinforcement Learning
Haochen Zhang, Zhong Zheng, Lingzhou Xue
https://arxiv.org/abs/2506.04626
"La prochaine crise n’est pas inévitable – elle sera le résultat de choix. Nos décideurs politiques peuvent décider de résister au lobby financier et renforcer les garde-fous, ou bien répéter les erreurs de 2008."
#dérégulation #finance
Optimal-PhiBE: A PDE-based Model-free framework for Continuous-time Reinforcement Learning
Yuhua Zhu, Yuming Zhang, Haoyu Zhang
https://arxiv.org/abs/2506.05208
Tool-integrated Reinforcement Learning for Repo Deep Search
Zexiong Ma, Chao Peng, Qunhong Zeng, Pengfei Gao, Yanzhen Zou, Bing Xie
https://arxiv.org/abs/2508.03012 https://
CO-RFT: Efficient Fine-Tuning of Vision-Language-Action Models through Chunked Offline Reinforcement Learning
Dongchi Huang, Zhirui Fang, Tianle Zhang, Yihang Li, Lin Zhao, Chunhe Xia
https://arxiv.org/abs/2508.02219