Tootfinder

No exact results. Similar results found.

@arXiv_csLG_bot@mastoxiv.page
2025-07-04 10:13:51

On Efficient Bayesian Exploration in Model-Based Reinforcement Learning
Alberto Caron, Chris Hicks, Vasilios Mavroudis
https://arxiv.org/abs/2507.02639 htt…

On Efficient Bayesian Exploration in Model-Based Reinforcement Learning
In this work, we address the challenge of data-efficient exploration in reinforcement learning by examining existing principled, information-theoretic approaches to intrinsic motivation. Specifically, we focus on a class of exploration bonuses that targets epistemic uncertainty rather than the aleatoric noise inherent in the environment. We prove that these bonuses naturally signal epistemic information gains and converge to zero once the agent becomes sufficiently certain about the environment…

@arXiv_csAI_bot@mastoxiv.page
2025-09-05 09:58:41

CoT-Space: A Theoretical Framework for Internal Slow-Thinking via Reinforcement Learning
Zeyu Gan, Hao Yi, Yong Liu
https://arxiv.org/abs/2509.04027 https://

CoT-Space: A Theoretical Framework for Internal Slow-Thinking via Reinforcement Learning
Reinforcement Learning (RL) has become a pivotal approach for enhancing the reasoning capabilities of Large Language Models (LLMs). However, a significant theoretical gap persists, as traditional token-level RL frameworks fail to align with the reasoning-level nature of complex, multi-step thought processes like Chain-of-Thought (CoT). To address this challenge, we introduce CoT-Space, a novel theoretical framework that recasts LLM reasoning from a discrete token-prediction task to an optimizat…

@arXiv_csLG_bot@mastoxiv.page
2025-06-05 10:59:18

This https://arxiv.org/abs/2505.24298 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csLG_…

AReaL: A Large-Scale Asynchronous Reinforcement Learning System for Language Reasoning
Reinforcement learning (RL) has become a trending paradigm for training large language models (LLMs), particularly for reasoning tasks. Effective RL for LLMs requires massive parallelization and poses an urgent need for efficient training systems. Most existing large-scale RL systems for LLMs are synchronous by alternating generation and training in a batch setting, where the rollouts in each training batch are generated by the same (or latest) model. This stabilizes RL training but suffers fro…

@arXiv_csCL_bot@mastoxiv.page
2025-09-01 08:42:02

BED-LLM: Intelligent Information Gathering with LLMs and Bayesian Experimental Design
Deepro Choudhury, Sinead Williamson, Adam Goli\'nski, Ning Miao, Freddie Bickford Smith, Michael Kirchhof, Yizhe Zhang, Tom Rainforth
https://arxiv.org/abs/2508.21184

BED-LLM: Intelligent Information Gathering with LLMs and Bayesian Experimental Design
We propose a general-purpose approach for improving the ability of Large Language Models (LLMs) to intelligently and adaptively gather information from a user or other external source using the framework of sequential Bayesian experimental design (BED). This enables LLMs to act as effective multi-turn conversational agents and interactively interface with external environments. Our approach, which we call BED-LLM (Bayesian Experimental Design with Large Language Models), is based on iteratively…

@arXiv_csRO_bot@mastoxiv.page
2025-07-02 08:30:20

Control-Optimized Deep Reinforcement Learning for Artificially Intelligent Autonomous Systems
Oren Fivel, Matan Rudman, Kobi Cohen
https://arxiv.org/abs/2507.00268

Control-Optimized Deep Reinforcement Learning for Artificially Intelligent Autonomous Systems
Deep reinforcement learning (DRL) has become a powerful tool for complex decision-making in machine learning and AI. However, traditional methods often assume perfect action execution, overlooking the uncertainties and deviations between an agent's selected actions and the actual system response. In real-world applications, such as robotics, mechatronics, and communication networks, execution mismatches arising from system dynamics, hardware constraints, and latency can significantly degrade pe…

@arXiv_csAI_bot@mastoxiv.page
2025-09-03 08:59:03

Know When to Explore: Difficulty-Aware Certainty as a Guide for LLM Reinforcement Learning
Ang Li, Zhihang Yuan, Yang Zhang, Shouda Liu, Yisen Wang
https://arxiv.org/abs/2509.00125

Know When to Explore: Difficulty-Aware Certainty as a Guide for LLM Reinforcement Learning
Reinforcement Learning with Verifiable Feedback (RLVF) has become a key technique for enhancing the reasoning abilities of Large Language Models (LLMs). However, its reliance on sparse, outcome based rewards, which only indicate if a final answer is correct or not, fails to provide granular guidance on the reasoning process itself. This limitation hinders efficient learning, as the model cannot distinguish between high quality and inefficient solutions, nor can it learn effectively from differe…

@arXiv_csLG_bot@mastoxiv.page
2025-09-01 09:52:42

Priors Matter: Addressing Misspecification in Bayesian Deep Q-Learning
Pascal R. van der Vaart, Neil Yorke-Smith, Matthijs T. J. Spaan
https://arxiv.org/abs/2508.21488 https://

Priors Matter: Addressing Misspecification in Bayesian Deep Q-Learning
Uncertainty quantification in reinforcement learning can greatly improve exploration and robustness. Approximate Bayesian approaches have recently been popularized to quantify uncertainty in model-free algorithms. However, so far the focus has been on improving the accuracy of the posterior approximation, instead of studying the accuracy of the prior and likelihood assumptions underlying the posterior. In this work, we demonstrate that there is a cold posterior effect in Bayesian deep Q-learnin…

@arXiv_csLG_bot@mastoxiv.page
2025-07-31 09:18:31

Spatial-Temporal Reinforcement Learning for Network Routing with Non-Markovian Traffic
Molly Wang
https://arxiv.org/abs/2507.22174 https://arxiv.org/pdf/25…

Spatial-Temporal Reinforcement Learning for Network Routing with Non-Markovian Traffic
Reinforcement Learning (RL) has become a well-established approach for optimizing packet routing in communication networks. Standard RL algorithms typically are based on the Markov Decision Process (MDP), which assumes that the current state of the environment provides all the necessary information for system evolution and decision-making. However, this Markovian assumption is invalid in many practical scenarios, making the MDP and RL frameworks inadequate to produce the optimal solutions. Addi…

@arXiv_csLG_bot@mastoxiv.page
2025-09-01 09:58:32

Neural Network Acceleration on MPSoC board: Integrating SLAC's SNL, Rogue Software and Auto-SNL
Hamza Ezzaoui Rahali, Abhilasha Dave, Larry Ruckman, Mohammad Mehdi Rahimifar, Audrey C. Therrien, James J. Russel, Ryan T. Herbst
https://arxiv.org/abs/2508.21739

Neural Network Acceleration on MPSoC board: Integrating SLAC's SNL, Rogue Software and Auto-SNL
The LCLS-II Free Electron Laser (FEL) will generate X-ray pulses for beamline experiments at rates of up to 1~MHz, with detectors producing data throughputs exceeding 1 TB/s. Managing such massive data streams presents significant challenges, as transmission and storage infrastructures become prohibitively expensive. Machine learning (ML) offers a promising solution for real-time data reduction, but conventional implementations introduce excessive latency, making them unsuitable for high-speed …

Tootfinder

Opt-in global Mastodon full text search. Join the index!