Tootfinder

No exact results. Similar results found.

@dcm@social.sunet.se
2025-06-05 14:23:15

Another of my forays into AI ethics is just out! This time the focus is on the ethics (or lack thereof) of Reinforcement Learning Feedback (RLF) techniques aimed at increasing the 'alignment' of LLMs.
The paper is fruit of the joint work of a great team of collaborators, among whom @… and @…

Helpful, harmless, honest? Sociotechnical limits of AI alignment and safety through Reinforcement Learning from Human Feedback - Ethics and Information Technology
This paper critically evaluates the attempts to align Artificial Intelligence (AI) systems, especially Large Language Models (LLMs), with human values and intentions through Reinforcement Learning from Feedback methods, involving either human feedback (RLHF) or AI feedback (RLAIF). Specifically, we show the shortcomings of the broadly pursued alignment goals of honesty, harmlessness, and helpfulness. Through a multidisciplinary sociotechnical critique, we examine both the theoretical underpinni…

@ErikJonker@mastodon.social
2025-07-07 11:38:15

Good article how reinforcement learning improved current AI models. Also illustrates that LLMs today are not just imitating.
https://arstechnica.com/ai/2025/07/how

How a big shift in training LLMs led to a capability explosion
Reinforcement learning, explained with a minimum of math and jargon.

@heiseonline@social.heise.de
2025-08-01 17:04:00

Vier Raumfahrer zur ISS gestartet
Eine Kapsel des Raumfahrtunternehmens SpaceX bringt eine vierköpfige Crew ins All. Das Wetter sorgte zunächst für Verzögerung.
https://www.heise.de/news/Vier-Raumfahrer-

Vier Raumfahrer zur ISS gestartet
Eine Kapsel des Raumfahrtunternehmens SpaceX bringt eine vierköpfige Crew ins All. Das Wetter sorgte zunächst für Verzögerung.

@arXiv_csAI_bot@mastoxiv.page
2025-08-06 10:19:10

Agent Lightning: Train ANY AI Agents with Reinforcement Learning
Xufang Luo, Yuge Zhang, Zhiyuan He, Zilong Wang, Siyun Zhao, Dongsheng Li, Luna K. Qiu, Yuqing Yang
https://arxiv.org/abs/2508.03680

Agent Lightning: Train ANY AI Agents with Reinforcement Learning
We present Agent Lightning, a flexible and extensible framework that enables Reinforcement Learning (RL)-based training of Large Language Models (LLMs) for any AI agent. Unlike existing methods that tightly couple RL training with agent or rely on sequence concatenation with masking, Agent Lightning achieves complete decoupling between agent execution and training, allowing seamless integration with existing agents developed via diverse ways (e.g., using frameworks like LangChain, OpenAI Agents…

@arXiv_statML_bot@mastoxiv.page
2025-06-06 07:39:46

Regret-Optimal Q-Learning with Low Cost for Single-Agent and Federated Reinforcement Learning
Haochen Zhang, Zhong Zheng, Lingzhou Xue
https://arxiv.org/abs/2506.04626

Regret-Optimal Q-Learning with Low Cost for Single-Agent and Federated Reinforcement Learning
Motivated by real-world settings where data collection and policy deployment -- whether for a single agent or across multiple agents -- are costly, we study the problem of on-policy single-agent reinforcement learning (RL) and federated RL (FRL) with a focus on minimizing burn-in costs (the sample sizes needed to reach near-optimal regret) and policy switching or communication costs. In parallel finite-horizon episodic Markov Decision Processes (MDPs) with $S$ states and $A$ actions, existing m…

@jlpiraux@wallonie-bruxelles.social
2025-06-07 06:34:26

"La prochaine crise n’est pas inévitable – elle sera le résultat de choix. Nos décideurs politiques peuvent décider de résister au lobby financier et renforcer les garde-fous, ou bien répéter les erreurs de 2008."
#dérégulation #finance

Une nouvelle crise financière se prépare - Le blog, pour une finance au service de la société
La prochaine crise n’est pas inévitable - elle sera le résultat de choix. Nos décideurs politiques peuvent décider de résister au lobby financier et renforcer les garde-fous, ou bien répéter les erreurs de 2008. REUTERS/Lucas Jackson Les leçons que nous aurions dû tirer de 2008 En 2008, le monde a subi la pire crise financière

@arXiv_mathOC_bot@mastoxiv.page
2025-06-06 07:28:02

Optimal-PhiBE: A PDE-based Model-free framework for Continuous-time Reinforcement Learning
Yuhua Zhu, Yuming Zhang, Haoyu Zhang
https://arxiv.org/abs/2506.05208

Optimal-PhiBE: A PDE-based Model-free framework for Continuous-time Reinforcement Learning
This paper addresses continuous-time reinforcement learning (CTRL) where the system dynamics are governed by a stochastic differential equation but are unknown, and only discrete-time observations are available. Existing approaches face limitations: model-based PDE methods suffer from non-identifiability, while model-free methods based on the optimal Bellman equation (Optimal-BE) are prone to large discretization errors sensitive to both the dynamics and reward structure. To overcome these chal…

@arXiv_csSE_bot@mastoxiv.page
2025-08-06 09:25:40

Tool-integrated Reinforcement Learning for Repo Deep Search
Zexiong Ma, Chao Peng, Qunhong Zeng, Pengfei Gao, Yanzhen Zou, Bing Xie
https://arxiv.org/abs/2508.03012 https://

Tool-integrated Reinforcement Learning for Repo Deep Search
Issue localization, the process of identifying code locations that need modification to resolve software issues, is a critical yet challenging task in software development. The semantic gap between natural language issue descriptions and faulty code requires complex multi-hop reasoning through code dependencies. Existing LLM-based agents attempt to address this by integrating repository retrieval tools. However, this transforms issue localization into a demanding task we call Repo Deep Search, …

@arXiv_csRO_bot@mastoxiv.page
2025-08-05 11:45:31

CO-RFT: Efficient Fine-Tuning of Vision-Language-Action Models through Chunked Offline Reinforcement Learning
Dongchi Huang, Zhirui Fang, Tianle Zhang, Yihang Li, Lin Zhao, Chunhe Xia
https://arxiv.org/abs/2508.02219

CO-RFT: Efficient Fine-Tuning of Vision-Language-Action Models through Chunked Offline Reinforcement Learning
Vision-Language-Action (VLA) models demonstrate significant potential for developing generalized policies in real-world robotic control. This progress inspires researchers to explore fine-tuning these models with Reinforcement Learning (RL). However, fine-tuning VLA models with RL still faces challenges related to sample efficiency, compatibility with action chunking, and training stability. To address these challenges, we explore the fine-tuning of VLA models through offline reinforcement lear…

@heiseonline@social.heise.de
2025-08-02 09:31:00

"Crew 11" zur Ablöse: Vier Raumfahrer an der ISS angekommen
Eine Kapsel des Raumfahrtunternehmens SpaceX hat eine vierköpfige Crew zur ISS gebracht. Die "Crew 11" startete mit einem Tag Verspätung.

"Crew 11" zur Ablöse: Vier Raumfahrer an der ISS angekommen
Eine Kapsel des Raumfahrtunternehmens SpaceX hat eine vierköpfige Crew zur ISS gebracht. Die "Crew 11" startete mit einem Tag Verspätung.

Tootfinder

Opt-in global Mastodon full text search. Join the index!