Tootfinder

Opt-in global Mastodon full text search. Join the index!

@arXiv_csCL_bot@mastoxiv.page
2025-06-27 09:58:19

Bridging Offline and Online Reinforcement Learning for LLMs
Jack Lanchantin, Angelica Chen, Janice Lan, Xian Li, Swarnadeep Saha, Tianlu Wang, Jing Xu, Ping Yu, Weizhe Yuan, Jason E Weston, Sainbayar Sukhbaatar, Ilia Kulikov
arxiv.org/abs/2506.21495 arxiv.org/pdf/2506.21495 arxiv.org/html/2506.21495
arXiv:2506.21495v1 Announce Type: new
Abstract: We investigate the effectiveness of reinforcement learning methods for finetuning large language models when transitioning from offline to semi-online to fully online regimes for both verifiable and non-verifiable tasks. Our experiments cover training on verifiable math as well as non-verifiable instruction following with a set of benchmark evaluations for both. Across these settings, we extensively compare online and semi-online Direct Preference Optimization and Group Reward Policy Optimization objectives, and surprisingly find similar performance and convergence between these variants, which all strongly outperform offline methods. We provide a detailed analysis of the training dynamics and hyperparameter selection strategies to achieve optimal results. Finally, we show that multi-tasking with verifiable and non-verifiable rewards jointly yields improved performance across both task types.
toXiv_bot_toot

@arXiv_csLG_bot@mastoxiv.page
2025-06-09 10:11:42

carps: A Framework for Comparing N Hyperparameter Optimizers on M Benchmarks
Carolin Benjamins, Helena Graf, Sarah Segel, Difan Deng, Tim Ruhkopf, Leona Hennig, Soham Basu, Neeratyoy Mallik, Edward Bergman, Deyao Chen, Fran\c{c}ois Cl\'ement, Matthias Feurer, Katharina Eggensperger, Frank Hutter, Carola Doerr, Marius Lindauer

@arXiv_mathOC_bot@mastoxiv.page
2025-05-30 10:14:28

This arxiv.org/abs/2412.06481 has been replaced.
initial toot: mastoxiv.page/@arXiv_mat…

@arXiv_physicscompph_bot@mastoxiv.page
2025-06-23 09:13:00

Great Restraining Wall in Multidimentional Collective Variable Space
Zhijun Pan, Maodong Li, Dechin Chen, Yi Isaac Yang
arxiv.org/abs/2506.17043

@arXiv_mathOC_bot@mastoxiv.page
2025-05-30 10:14:28

This arxiv.org/abs/2412.06481 has been replaced.
initial toot: mastoxiv.page/@arXiv_mat…

@arXiv_csNE_bot@mastoxiv.page
2025-06-11 07:44:43

A Practical Guide to Tuning Spiking Neuronal Dynamics
William Gebhardt, Alexander G. Ororbia, Nathan McDonald, Clare Thiem, Jack Lombardi
arxiv.org/abs/2506.08138

@arXiv_statML_bot@mastoxiv.page
2025-05-30 10:16:22

This arxiv.org/abs/2502.06044 has been replaced.
initial toot: mastoxiv.page/@arXiv_sta…

@arXiv_csGR_bot@mastoxiv.page
2025-06-10 07:38:52

Vid2Sim: Generalizable, Video-based Reconstruction of Appearance, Geometry and Physics for Mesh-free Simulation
Chuhao Chen, Zhiyang Dou, Chen Wang, Yiming Huang, Anjun Chen, Qiao Feng, Jiatao Gu, Lingjie Liu
arxiv.org/abs/2506.06440

@arXiv_csLG_bot@mastoxiv.page
2025-06-10 19:23:11

This arxiv.org/abs/2506.05673 has been replaced.
initial toot: mastoxiv.page/@arXiv_csLG_…

@arXiv_mathOC_bot@mastoxiv.page
2025-06-19 09:08:07

On the Effectiveness of Classical Regression Methods for Optimal Switching Problems
Martin Andersson, Benny Avelin, Marcus Olofsson
arxiv.org/abs/2506.15436

@arXiv_csLG_bot@mastoxiv.page
2025-06-05 10:52:58

This arxiv.org/abs/2503.22733 has been replaced.
initial toot: mastoxiv.page/@arXiv_csLG_…

@arXiv_csLG_bot@mastoxiv.page
2025-06-03 21:31:55

This arxiv.org/abs/2505.00812 has been replaced.
initial toot: mastoxiv.page/@arXiv_csLG_…

@arXiv_mathOC_bot@mastoxiv.page
2025-06-02 07:27:33

Fine-tuning for Data-enabled Predictive Control of Noisy Systems by Reinforcement Learning
Jinbao Wang, Shiliang Zhang, Jun Liu, Xuehui Ma, Haolin Liu
arxiv.org/abs/2505.24572