h1: Bootstrapping LLMs to Reason over Longer Horizons via Reinforcement LearningSumeet Ramesh Motwani, Alesia Ivanova, Ziyang Cai, Philip Torr, Riashat Islam, Shital Shah, Christian Schroeder de Witt, Charles Londonhttps://arxiv.org/abs/2510.07312
h1: Bootstrapping LLMs to Reason over Longer Horizons via Reinforcement LearningLarge language models excel at short-horizon reasoning tasks, but performance drops as reasoning horizon lengths increase. Existing approaches to combat this rely on inference-time scaffolding or costly step-level supervision, neither of which scales easily. In this work, we introduce a scalable method to bootstrap long-horizon reasoning capabilities using only existing, abundant short-horizon data. Our approach synthetically composes simple problems into complex, multi-step dependency chains o…