Tootfinder

Opt-in global Mastodon full text search. Join the index!

No exact results. Similar results found.
@dcm@social.sunet.se
2025-06-05 14:23:15

Another of my forays into AI ethics is just out! This time the focus is on the ethics (or lack thereof) of Reinforcement Learning Feedback (RLF) techniques aimed at increasing the 'alignment' of LLMs.
The paper is fruit of the joint work of a great team of collaborators, among whom @… and @…

@ErikJonker@mastodon.social
2025-07-07 11:38:15

Good article how reinforcement learning improved current AI models. Also illustrates that LLMs today are not just imitating.
arstechnica.com/ai/2025/07/how

@heiseonline@social.heise.de
2025-08-01 17:04:00

Vier Raumfahrer zur ISS gestartet
Eine Kapsel des Raumfahrtunternehmens SpaceX bringt eine vierköpfige Crew ins All. Das Wetter sorgte zunächst für Verzögerung.
heise.de/news/Vier-Raumfahrer-

@arXiv_csAI_bot@mastoxiv.page
2025-08-06 10:19:10

Agent Lightning: Train ANY AI Agents with Reinforcement Learning
Xufang Luo, Yuge Zhang, Zhiyuan He, Zilong Wang, Siyun Zhao, Dongsheng Li, Luna K. Qiu, Yuqing Yang
arxiv.org/abs/2508.03680

@arXiv_statML_bot@mastoxiv.page
2025-06-06 07:39:46

Regret-Optimal Q-Learning with Low Cost for Single-Agent and Federated Reinforcement Learning
Haochen Zhang, Zhong Zheng, Lingzhou Xue
arxiv.org/abs/2506.04626

@jlpiraux@wallonie-bruxelles.social
2025-06-07 06:34:26

"La prochaine crise n’est pas inévitable – elle sera le résultat de choix. Nos décideurs politiques peuvent décider de résister au lobby financier et renforcer les garde-fous, ou bien répéter les erreurs de 2008."
#dérégulation #finance

@arXiv_mathOC_bot@mastoxiv.page
2025-06-06 07:28:02

Optimal-PhiBE: A PDE-based Model-free framework for Continuous-time Reinforcement Learning
Yuhua Zhu, Yuming Zhang, Haoyu Zhang
arxiv.org/abs/2506.05208

@arXiv_csSE_bot@mastoxiv.page
2025-08-06 09:25:40

Tool-integrated Reinforcement Learning for Repo Deep Search
Zexiong Ma, Chao Peng, Qunhong Zeng, Pengfei Gao, Yanzhen Zou, Bing Xie
arxiv.org/abs/2508.03012

@arXiv_csRO_bot@mastoxiv.page
2025-08-05 11:45:31

CO-RFT: Efficient Fine-Tuning of Vision-Language-Action Models through Chunked Offline Reinforcement Learning
Dongchi Huang, Zhirui Fang, Tianle Zhang, Yihang Li, Lin Zhao, Chunhe Xia
arxiv.org/abs/2508.02219

@heiseonline@social.heise.de
2025-08-02 09:31:00

"Crew 11" zur Ablöse: Vier Raumfahrer an der ISS angekommen
Eine Kapsel des Raumfahrtunternehmens SpaceX hat eine vierköpfige Crew zur ISS gebracht. Die "Crew 11" startete mit einem Tag Verspätung.