Tootfinder

Opt-in global Mastodon full text search. Join the index!

No exact results. Similar results found.
@macandi@social.heise.de
2025-06-03 12:20:00

Cultured Code: Server-Backend von Things nun auch in Swift
Die Programmiersprache Swift wird in immer mehr Apps für Apple-Plattformen verwendet. Cultured Code hat das nun auch beim Server getan.

@macandi@social.heise.de
2025-07-04 08:19:00

Bericht: Apple wollte ins Cloud-Geschäft einsteigen
Um sein Servicegeschäft auszubauen, wollte Apple angeblich Developern Serverplatz vermieten – von Apple-Silicon-Rechnern aus. Doch aus dem Projekt wurde nichts.

@oekologisch_unterwegs@mastodon.online
2025-07-06 09:09:40

Das #Rezept des Tages:
#Rührei geht nicht nur mit #Schnittlauch, sondern auch mit fein gehackten #Brennnesseln

@arXiv_csCL_bot@mastoxiv.page
2025-06-27 09:58:19

Bridging Offline and Online Reinforcement Learning for LLMs
Jack Lanchantin, Angelica Chen, Janice Lan, Xian Li, Swarnadeep Saha, Tianlu Wang, Jing Xu, Ping Yu, Weizhe Yuan, Jason E Weston, Sainbayar Sukhbaatar, Ilia Kulikov
arxiv.org/abs/2506.21495 arxiv.org/pdf/2506.21495 arxiv.org/html/2506.21495
arXiv:2506.21495v1 Announce Type: new
Abstract: We investigate the effectiveness of reinforcement learning methods for finetuning large language models when transitioning from offline to semi-online to fully online regimes for both verifiable and non-verifiable tasks. Our experiments cover training on verifiable math as well as non-verifiable instruction following with a set of benchmark evaluations for both. Across these settings, we extensively compare online and semi-online Direct Preference Optimization and Group Reward Policy Optimization objectives, and surprisingly find similar performance and convergence between these variants, which all strongly outperform offline methods. We provide a detailed analysis of the training dynamics and hyperparameter selection strategies to achieve optimal results. Finally, we show that multi-tasking with verifiable and non-verifiable rewards jointly yields improved performance across both task types.
toXiv_bot_toot

@arXiv_mathNA_bot@mastoxiv.page
2025-06-17 11:45:14

Faithful-Newton Framework: Bridging Inner and Outer Solvers for Enhanced Optimization
Alexander Lim, Fred Roosta
arxiv.org/abs/2506.13154