Tootfinder

Opt-in global Mastodon full text search. Join the index!

@arXiv_csLG_bot@mastoxiv.page
2025-07-14 08:19:51

Low-rank Momentum Factorization for Memory Efficient Training
Pouria Mahdavinia, Mehrdad Mahdavi
arxiv.org/abs/2507.08091 arxiv.org/pdf/2507.08091 arxiv.org/html/2507.08091
arXiv:2507.08091v1 Announce Type: new
Abstract: Fine-tuning large foundation models presents significant memory challenges due to stateful optimizers like AdamW, often requiring several times more GPU memory than inference. While memory-efficient methods like parameter-efficient fine-tuning (e.g., LoRA) and optimizer state compression exist, recent approaches like GaLore bridge these by using low-rank gradient projections and subspace moment accumulation. However, such methods may struggle with fixed subspaces or computationally costly offline resampling (e.g., requiring full-matrix SVDs). We propose Momentum Factorized SGD (MoFaSGD), which maintains a dynamically updated low-rank SVD representation of the first-order momentum, closely approximating its full-rank counterpart throughout training. This factorization enables a memory-efficient fine-tuning method that adaptively updates the optimization subspace at each iteration. Crucially, MoFaSGD leverages the computed low-rank momentum factors to perform efficient spectrally normalized updates, offering an alternative to subspace moment accumulation. We establish theoretical convergence guarantees for MoFaSGD, proving it achieves an optimal rate for non-convex stochastic optimization under standard assumptions. Empirically, we demonstrate MoFaSGD's effectiveness on large language model alignment benchmarks, achieving a competitive trade-off between memory reduction (comparable to LoRA) and performance compared to state-of-the-art low-rank optimization methods. Our implementation is available at github.com/pmahdavi/MoFaSGD.
toXiv_bot_toot

@arXiv_csAI_bot@mastoxiv.page
2025-09-03 13:54:33

Dynamic Speculative Agent Planning
Yilin Guan, Wenyue Hua, Qingfeng Lan, Sun Fei, Dujian Ding, Devang Acharya, Chi Wang, William Yang Wang
arxiv.org/abs/2509.01920

@arXiv_csLG_bot@mastoxiv.page
2025-09-10 10:23:31

IP-Basis PINNs: Efficient Multi-Query Inverse Parameter Estimation
Shalev Manor, Mohammad Kohandel
arxiv.org/abs/2509.07245 arxiv.org/pdf/2…

@arXiv_statME_bot@mastoxiv.page
2025-06-30 09:08:20

Change Point Localization and Inference in Dynamic Multilayer Networks
Fan Wang, Kyle Ritscher, Yik Lun Kei, Xin Ma, Oscar Hernan Madrid Padilla
arxiv.org/abs/2506.21878

@arXiv_csCE_bot@mastoxiv.page
2025-08-04 08:51:31

Online Fine-Tuning of Carbon Emission Predictions using Real-Time Recurrent Learning for State Space Models
Julian Lemmel, Manuel Kranzl, Adam Lamine, Philipp Neubauer, Radu Grosu, Sophie Neubauer
arxiv.org/abs/2508.00804

@arXiv_csCV_bot@mastoxiv.page
2025-08-22 10:17:41

MapKD: Unlocking Prior Knowledge with Cross-Modal Distillation for Efficient Online HD Map Construction
Ziyang Yan, Ruikai Li, Zhiyong Cui, Bohan Li, Han Jiang, Yilong Ren, Aoyong Li, Zhenning Li, Sijia Wen, Haiyang Yu
arxiv.org/abs/2508.15653

@arXiv_eessSY_bot@mastoxiv.page
2025-08-29 08:55:11

Delay-adaptive Control of Nonlinear Systems with Approximate Neural Operator Predictors
Luke Bhan, Miroslav Krstic, Yuanyuan Shi
arxiv.org/abs/2508.20367

@arXiv_csCV_bot@mastoxiv.page
2025-08-26 12:32:47

InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
Weiyun Wang, Zhangwei Gao, Lixin Gu, Hengjun Pu, Long Cui, Xingguang Wei, Zhaoyang Liu, Linglin Jing, Shenglong Ye, Jie Shao, Zhaokai Wang, Zhe Chen, Hongjie Zhang, Ganlin Yang, Haomin Wang, Qi Wei, Jinhui Yin, Wenhao Li, Erfei Cui, Guanzhou Chen, Zichen Ding, Changyao Tian, Zhenyu Wu, Jingjing Xie, Zehao Li, Bowen Yang, Yuchen Duan, Xuehui Wang, Songze Li, Xiangyu Zhao, Haodong Duan, Nianche…

@arXiv_statCO_bot@mastoxiv.page
2025-08-04 09:12:41

Online Rolling Controlled Sequential Monte Carlo
Liwen Xue, Axel Finke, Adam M. Johansen
arxiv.org/abs/2508.00696 arxiv.org/pdf/2508.00696

@arXiv_csLG_bot@mastoxiv.page
2025-08-20 10:15:40

Revisiting Diffusion Q-Learning: From Iterative Denoising to One-Step Action Generation
Thanh Nguyen, Chang D. Yoo
arxiv.org/abs/2508.13904