Tootfinder

No exact results. Similar results found.

@heiseonline@social.heise.de
2025-11-14 07:07:01

China holt gestrandete Raumfahrer aus dem All zurück
Drei chinesische Astronauten sitzen nach einem Vorfall mit vermutlich Weltraumschrott im All fest. Die Bodenkontrolle will sie am Freitag zurückholen.
https://www.<…

China holt gestrandete Raumfahrer aus dem All zurück
Drei chinesische Astronauten sitzen nach einem Vorfall mit vermutlich Weltraumschrott im All fest. Die Bodenkontrolle will sie am Freitag zurückholen.

@arXiv_csLG_bot@mastoxiv.page
2025-10-14 13:37:08

How Reinforcement Learning After Next-Token Prediction Facilitates Learning
Nikolaos Tsilivis, Eran Malach, Karen Ullrich, Julia Kempe
https://arxiv.org/abs/2510.11495 https://

How Reinforcement Learning After Next-Token Prediction Facilitates Learning
Recent advances in reasoning domains with neural networks have primarily been enabled by a training recipe that optimizes Large Language Models, previously trained to predict the next-token in a sequence, with reinforcement learning algorithms. We introduce a framework to study the success of this paradigm, and we theoretically expose the optimization mechanisms by which reinforcement learning improves over next-token prediction in this setting. We study learning from mixture distributions of s…

@arXiv_csCL_bot@mastoxiv.page
2025-10-14 13:08:08

Stabilizing MoE Reinforcement Learning by Aligning Training and Inference Routers
Wenhan Ma, Hailin Zhang, Liang Zhao, Yifan Song, Yudong Wang, Zhifang Sui, Fuli Luo
https://arxiv.org/abs/2510.11370

Stabilizing MoE Reinforcement Learning by Aligning Training and Inference Routers
Reinforcement learning (RL) has emerged as a crucial approach for enhancing the capabilities of large language models. However, in Mixture-of-Experts (MoE) models, the routing mechanism often introduces instability, even leading to catastrophic RL training collapse. We analyze the training-inference consistency of MoE models and identify a notable discrepancy in routing behaviors between the two phases. Moreover, even under identical conditions, the routing framework can yield divergent expert …

@heiseonline@social.heise.de
2025-11-13 12:08:00

"Syberia Remastered" im Test: Gefangen zwischen den Zeiten
Ein wunderschönes Grafikadventure wird renoviert. Doch "Syberia Remastered" hat leider auch Schwächen. Kate Walker kämpft mit Mäusen und einer Doppelgängerin.
…

"Syberia Remastered": Schrödingers Kate Walker
Ein wunderschönes Grafikadventure wird renoviert. Doch "Syberia Remastered" hat leider auch Schwächen. Kate Walker kämpft mit Mäusen und einer Doppelgängerin.

@arXiv_csLG_bot@mastoxiv.page
2025-10-14 13:38:08

Offline Reinforcement Learning with Generative Trajectory Policies
Xinsong Feng, Leshu Tang, Chenan Wang, Haipeng Chen
https://arxiv.org/abs/2510.11499 https://

Offline Reinforcement Learning with Generative Trajectory Policies
Generative models have emerged as a powerful class of policies for offline reinforcement learning (RL) due to their ability to capture complex, multi-modal behaviors. However, existing methods face a stark trade-off: slow, iterative models like diffusion policies are computationally expensive, while fast, single-step models like consistency policies often suffer from degraded performance. In this paper, we demonstrate that it is possible to bridge this gap. The key to moving beyond the limitati…

@heiseonline@social.heise.de
2025-10-14 10:24:00

Starship V2 absolviert erfolgreichen letzten Testflug
Der letzte Testflug von Starship V2 dauerte eine Stunde und verlief weitgehend problemlos. Unterwegs setzte das Raumfahrzeug Satellitenattrappen aus.
https://www.

Letzter Flug von Starship V2 war erfolgreich
Der letzte Testflug von Starship V2 dauerte eine Stunde und verlief weitgehend problemlos. Unterwegs setzte das Raumfahrzeug Satellitenattrappen aus.

@arXiv_csCL_bot@mastoxiv.page
2025-10-14 13:19:08

Demystifying Reinforcement Learning in Agentic Reasoning
Zhaochen Yu, Ling Yang, Jiaru Zou, Shuicheng Yan, Mengdi Wang
https://arxiv.org/abs/2510.11701 https://

Demystifying Reinforcement Learning in Agentic Reasoning
Recently, the emergence of agentic RL has showcased that RL could also effectively improve the agentic reasoning ability of LLMs, yet the key design principles and optimal practices remain unclear. In this work, we conduct a comprehensive and systematic investigation to demystify reinforcement learning in agentic reasoning from three key perspectives: data, algorithm, and reasoning mode. We highlight our key insights: (i) Replacing stitched synthetic trajectories with real end-to-end tool-use t…

@arXiv_csLG_bot@mastoxiv.page
2025-10-14 13:43:48

Reinforced sequential Monte Carlo for amortised sampling
Sanghyeok Choi, Sarthak Mittal, V\'ictor Elvira, Jinkyoo Park, Nikolay Malkin
https://arxiv.org/abs/2510.11711 https…

Reinforced sequential Monte Carlo for amortised sampling
This paper proposes a synergy of amortised and particle-based methods for sampling from distributions defined by unnormalised density functions. We state a connection between sequential Monte Carlo (SMC) and neural sequential samplers trained by maximum-entropy reinforcement learning (MaxEnt RL), wherein learnt sampling policies and value functions define proposal kernels and twist functions. Exploiting this connection, we introduce an off-policy RL training procedure for the sampler that uses …

@arXiv_csLG_bot@mastoxiv.page
2025-10-14 13:38:18

Context-Aware Model-Based Reinforcement Learning for Autonomous Racing
Emran Yasser Moustafa, Ivana Dusparic
https://arxiv.org/abs/2510.11501 https://arxiv…

Context-Aware Model-Based Reinforcement Learning for Autonomous Racing
Autonomous vehicles have shown promising potential to be a groundbreaking technology for improving the safety of road users. For these vehicles, as well as many other safety-critical robotic technologies, to be deployed in real-world applications, we require algorithms that can generalize well to unseen scenarios and data. Model-based reinforcement learning algorithms (MBRL) have demonstrated state-of-the-art performance and data efficiency across a diverse set of domains. However, these algori…

@arXiv_csLG_bot@mastoxiv.page
2025-10-14 13:42:39

QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs
Wei Huang, Yi Ge, Shuai Yang, Yicheng Xiao, Huizi Mao, Yujun Lin, Hanrong Ye, Sifei Liu, Ka Chun Cheung, Hongxu Yin, Yao Lu, Xiaojuan Qi, Song Han, Yukang Chen
https://arxiv.org/abs/2510.11696

QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs
We propose QeRL, a Quantization-enhanced Reinforcement Learning framework for large language models (LLMs). While RL is essential for LLMs' reasoning capabilities, it is resource-intensive, requiring substantial GPU memory and long rollout durations. QeRL addresses these issues by combining NVFP4 quantization with Low-Rank Adaptation (LoRA), accelerating rollout phase of RL while reducing memory overhead. Beyond efficiency, our findings show that quantization noise increases policy entropy, enh…

Tootfinder

Opt-in global Mastodon full text search. Join the index!