Tootfinder

No exact results. Similar results found.

@macandi@social.heise.de
2025-06-03 12:20:00

Cultured Code: Server-Backend von Things nun auch in Swift
Die Programmiersprache Swift wird in immer mehr Apps für Apple-Plattformen verwendet. Cultured Code hat das nun auch beim Server getan.
https:/…

Cultured Code: Server-Backend von Things nun auch in Swift
Die Programmiersprache Swift wird in immer mehr Apps für Apple-Plattformen verwendet. Cultured Code hat das nun auch beim Server getan.

@macandi@social.heise.de
2025-07-04 08:19:00

Bericht: Apple wollte ins Cloud-Geschäft einsteigen
Um sein Servicegeschäft auszubauen, wollte Apple angeblich Developern Serverplatz vermieten – von Apple-Silicon-Rechnern aus. Doch aus dem Projekt wurde nichts.

Bericht: Apple wollte ins Cloud-Geschäft einsteigen
Um sein Servicegeschäft auszubauen, wollte Apple angeblich Developern Serverplatz vermieten – von Apple-Silicon-Rechnern aus. Doch aus dem Projekt wurde nichts.

@oekologisch_unterwegs@mastodon.online
2025-07-06 09:09:40

Das #Rezept des Tages:
#Rührei geht nicht nur mit #Schnittlauch, sondern auch mit fein gehackten #Brennnesseln

Rührei mit Brennnesseln
Neben Brennnesselsuppe und Fussili mit Brennnesseln kann man Brennnesselblätter auch als Küchenkraut verwenden und wie Schnittlauch mit Rührei servieren.

@arXiv_csCL_bot@mastoxiv.page
2025-06-27 09:58:19

Bridging Offline and Online Reinforcement Learning for LLMs
Jack Lanchantin, Angelica Chen, Janice Lan, Xian Li, Swarnadeep Saha, Tianlu Wang, Jing Xu, Ping Yu, Weizhe Yuan, Jason E Weston, Sainbayar Sukhbaatar, Ilia Kulikov
https://arxiv.org/abs/2506.21495 https://arxiv.org/pdf/2506.21495 https://arxiv.org/html/2506.21495
arXiv:2506.21495v1 Announce Type: new
Abstract: We investigate the effectiveness of reinforcement learning methods for finetuning large language models when transitioning from offline to semi-online to fully online regimes for both verifiable and non-verifiable tasks. Our experiments cover training on verifiable math as well as non-verifiable instruction following with a set of benchmark evaluations for both. Across these settings, we extensively compare online and semi-online Direct Preference Optimization and Group Reward Policy Optimization objectives, and surprisingly find similar performance and convergence between these variants, which all strongly outperform offline methods. We provide a detailed analysis of the training dynamics and hyperparameter selection strategies to achieve optimal results. Finally, we show that multi-tasking with verifiable and non-verifiable rewards jointly yields improved performance across both task types.
toXiv_bot_toot

@arXiv_mathNA_bot@mastoxiv.page
2025-06-17 11:45:14

Faithful-Newton Framework: Bridging Inner and Outer Solvers for Enhanced Optimization
Alexander Lim, Fred Roosta
https://arxiv.org/abs/2506.13154 https://

Faithful-Newton Framework: Bridging Inner and Outer Solvers for Enhanced Optimization
While Newton-type methods are known for their fast local convergence and strong empirical performance, achieving theoretically favorable global convergence compared to first-order methods remains a key challenge. For instance, surprisingly, for simple strongly convex problems, no straightforward variant of Newton's method matches the global complexity of gradient descent. Although sophisticated variants can improve iteration complexity over gradient descent for various problems, they often invo…

Tootfinder

Opt-in global Mastodon full text search. Join the index!