
2025-07-08 12:19:30
Predictive posteriors under hidden confounding
Carlos Garc\'ia Meixide, David R\'ios Insua
https://arxiv.org/abs/2507.05170 https://
Predictive posteriors under hidden confounding
Carlos Garc\'ia Meixide, David R\'ios Insua
https://arxiv.org/abs/2507.05170 https://
This https://arxiv.org/abs/2412.06481 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_mat…
This https://arxiv.org/abs/2503.22733 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csLG_…
This https://arxiv.org/abs/2412.06481 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_mat…
Posterior Transition Modeling for Unsupervised Diffusion-Based Speech Enhancement
Mostafa Sadeghi (MULTISPEECH), Jean-Eudes Ayilo (MULTISPEECH), Romain Serizel (MULTISPEECH), Xavier Alameda-Pineda (ROBOTLEARN)
https://arxiv.org/abs/2507.02391
Enhancing Power Flow Estimation with Topology-Aware Gated Graph Neural Networks
Shrenik Jadhav, Birva Sevak, Srijita Das, Wencong Su, Van-Hai Bui
https://arxiv.org/abs/2507.02078 …
This https://arxiv.org/abs/2505.00812 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csLG_…
This https://arxiv.org/abs/2502.06044 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_sta…
Bridging Offline and Online Reinforcement Learning for LLMs
Jack Lanchantin, Angelica Chen, Janice Lan, Xian Li, Swarnadeep Saha, Tianlu Wang, Jing Xu, Ping Yu, Weizhe Yuan, Jason E Weston, Sainbayar Sukhbaatar, Ilia Kulikov
https://arxiv.org/abs/2506.21495 https://arxiv.org/pdf/2506.21495 https://arxiv.org/html/2506.21495
arXiv:2506.21495v1 Announce Type: new
Abstract: We investigate the effectiveness of reinforcement learning methods for finetuning large language models when transitioning from offline to semi-online to fully online regimes for both verifiable and non-verifiable tasks. Our experiments cover training on verifiable math as well as non-verifiable instruction following with a set of benchmark evaluations for both. Across these settings, we extensively compare online and semi-online Direct Preference Optimization and Group Reward Policy Optimization objectives, and surprisingly find similar performance and convergence between these variants, which all strongly outperform offline methods. We provide a detailed analysis of the training dynamics and hyperparameter selection strategies to achieve optimal results. Finally, we show that multi-tasking with verifiable and non-verifiable rewards jointly yields improved performance across both task types.
toXiv_bot_toot
carps: A Framework for Comparing N Hyperparameter Optimizers on M Benchmarks
Carolin Benjamins, Helena Graf, Sarah Segel, Difan Deng, Tim Ruhkopf, Leona Hennig, Soham Basu, Neeratyoy Mallik, Edward Bergman, Deyao Chen, Fran\c{c}ois Cl\'ement, Matthias Feurer, Katharina Eggensperger, Frank Hutter, Carola Doerr, Marius Lindauer
https://
Fine-tuning for Data-enabled Predictive Control of Noisy Systems by Reinforcement Learning
Jinbao Wang, Shiliang Zhang, Jun Liu, Xuehui Ma, Haolin Liu
https://arxiv.org/abs/2505.24572
Great Restraining Wall in Multidimentional Collective Variable Space
Zhijun Pan, Maodong Li, Dechin Chen, Yi Isaac Yang
https://arxiv.org/abs/2506.17043 ht…
Replaced article(s) found for math.OC. https://arxiv.org/list/math.OC/new
[1/1]:
- Global relaxation-based LP-Newton method for multiple hyperparameter selection in support vector ...
Yaru Qian, Qingna Li, Alain Zemkoho
A Practical Guide to Tuning Spiking Neuronal Dynamics
William Gebhardt, Alexander G. Ororbia, Nathan McDonald, Clare Thiem, Jack Lombardi
https://arxiv.org/abs/2506.08138
Vid2Sim: Generalizable, Video-based Reconstruction of Appearance, Geometry and Physics for Mesh-free Simulation
Chuhao Chen, Zhiyang Dou, Chen Wang, Yiming Huang, Anjun Chen, Qiao Feng, Jiatao Gu, Lingjie Liu
https://arxiv.org/abs/2506.06440
This https://arxiv.org/abs/2506.05673 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csLG_…
On the Effectiveness of Classical Regression Methods for Optimal Switching Problems
Martin Andersson, Benny Avelin, Marcus Olofsson
https://arxiv.org/abs/2506.15436