Tootfinder

Opt-in global Mastodon full text search. Join the index!

No exact results. Similar results found.
@arXiv_csLG_bot@mastoxiv.page
2025-07-14 08:19:51

Low-rank Momentum Factorization for Memory Efficient Training
Pouria Mahdavinia, Mehrdad Mahdavi
arxiv.org/abs/2507.08091 arxiv.org/pdf/2507.08091 arxiv.org/html/2507.08091
arXiv:2507.08091v1 Announce Type: new
Abstract: Fine-tuning large foundation models presents significant memory challenges due to stateful optimizers like AdamW, often requiring several times more GPU memory than inference. While memory-efficient methods like parameter-efficient fine-tuning (e.g., LoRA) and optimizer state compression exist, recent approaches like GaLore bridge these by using low-rank gradient projections and subspace moment accumulation. However, such methods may struggle with fixed subspaces or computationally costly offline resampling (e.g., requiring full-matrix SVDs). We propose Momentum Factorized SGD (MoFaSGD), which maintains a dynamically updated low-rank SVD representation of the first-order momentum, closely approximating its full-rank counterpart throughout training. This factorization enables a memory-efficient fine-tuning method that adaptively updates the optimization subspace at each iteration. Crucially, MoFaSGD leverages the computed low-rank momentum factors to perform efficient spectrally normalized updates, offering an alternative to subspace moment accumulation. We establish theoretical convergence guarantees for MoFaSGD, proving it achieves an optimal rate for non-convex stochastic optimization under standard assumptions. Empirically, we demonstrate MoFaSGD's effectiveness on large language model alignment benchmarks, achieving a competitive trade-off between memory reduction (comparable to LoRA) and performance compared to state-of-the-art low-rank optimization methods. Our implementation is available at github.com/pmahdavi/MoFaSGD.
toXiv_bot_toot

@arXiv_csSE_bot@mastoxiv.page
2025-06-10 17:14:20

This arxiv.org/abs/2505.13229 has been replaced.
initial toot: mastoxiv.page/@arXiv_csSE_…

@arXiv_grqc_bot@mastoxiv.page
2025-07-04 09:25:31

Cosmography of Non-Interacting Ghost and Generalized Ghost Dark Energy Models in f(Q) Gravity
M. Sharif, Madiha Ajmal
arxiv.org/abs/2507.02280

@mxp@mastodon.acm.org
2025-05-30 20:35:37

«En cause, une accumulation de travaux qui congestionne toute la ville.» Ah, bien sûr, les voitures ne sont jamais le problème.
rts.ch/info/regions/geneve/202

@arXiv_quantph_bot@mastoxiv.page
2025-06-02 10:28:14

This arxiv.org/abs/2404.11444 has been replaced.
initial toot: mastoxiv.page/@arXiv_qu…

@arXiv_csLG_bot@mastoxiv.page
2025-06-09 10:13:42

Corrector Sampling in Language Models
Itai Gat, Neta Shaul, Uriel Singer, Yaron Lipman
arxiv.org/abs/2506.06215 arxiv…

@arXiv_econGN_bot@mastoxiv.page
2025-06-10 08:22:32

Do conditional cash transfers in childhood increase economic resilience in adulthood? Evidence from the COVID-19 pandemic shock in Ecuador
Jos\'e-Ignacio Ant\'on, Ruthy Intriago, Juan Ponce
arxiv.org/abs/2506.06903

@arXiv_physicsinsdet_bot@mastoxiv.page
2025-07-11 08:45:41

A Prototype Hybrid Mode Cavity for Heterodyne Axion Detection
Zenghai Li, Kevin Zhou, Marco Oriunno, Asher Berlin, Sergio Calatroni, Raffaele Tito D'Agnolo, Sebastian A. R. Ellis, Philip Schuster, Sami G. Tantawi, Natalia Toro
arxiv.org/abs/2507.07173

@shriramk@mastodon.social
2025-06-21 02:45:22

The number and breadth of extremely mainstream businesses who aren't afraid to participate in Rhode Island PrideFest is wonderful, given the way companies have been targeted just for acknowledging basic human rights. Nice to live here.

Lots. It's a full page of small print.
@arXiv_csLG_bot@mastoxiv.page
2025-06-10 19:21:11

This arxiv.org/abs/2505.23868 has been replaced.
initial toot: mastoxiv.page/@arXiv_csLG_…