Tool-Augmented Policy Optimization: Synergizing Reasoning and Adaptive Tool Use with Reinforcement LearningWenxun Wu, Yuanyang Li, Guhan Chen, Linyue Wang, Hongyang Chenhttps://arxiv.org/abs/2510.07038
Tool-Augmented Policy Optimization: Synergizing Reasoning and Adaptive Tool Use with Reinforcement LearningRecent advances in large language models (LLMs) have popularized test-time scaling, where models generate additional reasoning tokens before producing final answers. These approaches have demonstrated significant performance improvements on benchmarks involving mathematical reasoning. However, language models relying solely on direct inference still struggle with tasks demanding up-to-date knowledge or computational tools such as calculators and code interpreters for complex arithmetic operatio…