Merci trump
En Argentine, le parti de Javier Milei remporte largement les législatives de mi-mandat
https://www.lemonde.fr/international/article/2025/10/27/en-argentine-le-pa…
SWIRL: A Staged Workflow for Interleaved Reinforcement Learning in Mobile GUI Control
Quanfeng Lu, Zhantao Ma, Shuai Zhong, Jin Wang, Dahai Yu, Michael K. Ng, Ping Luo
https://arxiv.org/abs/2508.20018 …
GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning
Lakshya A Agrawal, Shangyin Tan, Dilara Soylu, Noah Ziems, Rishi Khare, Krista Opsahl-Ong, Arnav Singhvi, Herumb Shandilya, Michael J Ryan, Meng Jiang, Christopher Potts, Koushik Sen, Alexandros G. Dimakis, Ion Stoica, Dan Klein, Matei Zaharia, Omar Khattab
https://arx…
ReCoDe: Reinforcement Learning-based Dynamic Constraint Design for Multi-Agent Coordination
Michael Amir, Guang Yang, Zhan Gao, Keisuke Okumura, Heedo Woo, Amanda Prorok
https://arxiv.org/abs/2507.19151
DxHF: Providing High-Quality Human Feedback for LLM Alignment via Interactive Decomposition
Danqing Shi, Furui Cheng, Tino Weinkauf, Antti Oulasvirta, Mennatallah El-Assady
https://arxiv.org/abs/2507.18802
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
Weiyun Wang, Zhangwei Gao, Lixin Gu, Hengjun Pu, Long Cui, Xingguang Wei, Zhaoyang Liu, Linglin Jing, Shenglong Ye, Jie Shao, Zhaokai Wang, Zhe Chen, Hongjie Zhang, Ganlin Yang, Haomin Wang, Qi Wei, Jinhui Yin, Wenhao Li, Erfei Cui, Guanzhou Chen, Zichen Ding, Changyao Tian, Zhenyu Wu, Jingjing Xie, Zehao Li, Bowen Yang, Yuchen Duan, Xuehui Wang, Songze Li, Xiangyu Zhao, Haodong Duan, Nianche…
« Oskar », colonel ukrainien : « Une mobilisation massive serait une décision impopulaire, mais nécessaire »
https://www.lemonde.fr/international/article/2025/08/27/oskar…
UserRL: Training Interactive User-Centric Agent via Reinforcement Learning
Cheng Qian, Zuxin Liu, Akshara Prabhakar, Jielin Qiu, Zhiwei Liu, Haolin Chen, Shirley Kokane, Heng Ji, Weiran Yao, Shelby Heinecke, Silvio Savarese, Caiming Xiong, Huan Wang
https://arxiv.org/abs/2509.19736
MOSS-ChatV: Reinforcement Learning with Process Reasoning Reward for Video Temporal Reasoning
Sicheng Tao, Jungang Li, Yibo Yan, Junyan Zhang, Yubo Gao, Hanqian Li, ShuHang Xun, Yuxuan Fan, Hong Chen, Jianxiang He, Xuming Hu
https://arxiv.org/abs/2509.21113
Increasing Interaction Fidelity: Training Routines for Biomechanical Models in HCI
Micha{\l} Patryk Miazga, Patrick Ebel
https://arxiv.org/abs/2508.16581 https://