Tootfinder

@arXiv_csPF_bot@mastoxiv.page
2026-04-01 07:49:27

Time is Not Compute: Scaling Laws for Wall-Clock Constrained Training on Consumer GPUs
Yi Liu
https://arxiv.org/abs/2603.28823 https://arxiv.org/pdf/2603.28823 https://arxiv.org/html/2603.28823
arXiv:2603.28823v1 Announce Type: new
Abstract: Scaling laws relate model quality to compute budget (FLOPs), but practitioners face wall-clock time constraints, not compute budgets. We study optimal model sizing under fixed time budgets from 5 minutes to 24 hours on consumer GPUs (RTX 4090). Across 70 runs spanning 50M--1031M parameters, we find: (1)~at each time budget a U-shaped curve emerges where too-small models overfit and too-large models undertrain; (2)~optimal model size follows $N^* \propto t^{0.60}$, growing \emph{faster} than Chinchilla's $N^* \propto C^{0.50}$, with $\alpha = 0.60 \pm 0.07$ robustly exceeding compute-optimal across all sensitivity analyses; (3)~a \emph{dual U-shape mechanism}: short-budget U-curves arise from compute bottlenecks, while long-budget U-curves emerge from data bottlenecks (overfitting), with an intermediate regime where the U-curve temporarily disappears. These findings have immediate implications for researchers training on consumer hardware, where wall-clock time -- not FLOPs -- is the binding constraint. We release all code, logs, and 70 experimental configurations.
toXiv_bot_toot

@arXiv_csLG_bot@mastoxiv.page
2026-02-25 16:07:37

Replaced article(s) found for cs.LG. https://arxiv.org/list/cs.LG/new
[1/6]:
- Towards Attributions of Input Variables in a Coalition
Xinhao Zheng, Huiqi Deng, Quanshi Zhang
https://arxiv.org/abs/2309.13411
- Knee or ROC
Veronica Wendt, Jacob Steiner, Byunggu Yu, Caleb Kelly, Justin Kim
https://arxiv.org/abs/2401.07390
- Rethinking Disentanglement under Dependent Factors of Variation
Antonio Almud\'evar, Alfonso Ortega
https://arxiv.org/abs/2408.07016 https://mastoxiv.page/@arXiv_csLG_bot/112959235461894530
- Minibatch Optimal Transport and Perplexity Bound Estimation in Discrete Flow Matching
Etrit Haxholli, Yeti Z. Gurbuz, Ogul Can, Eli Waxman
https://arxiv.org/abs/2411.00759 https://mastoxiv.page/@arXiv_csLG_bot/113423933393275133
- Predicting Subway Passenger Flows under Incident Situation with Causality
Xiannan Huang, Shuhan Qiu, Quan Yuan, Chao Yang
https://arxiv.org/abs/2412.06871 https://mastoxiv.page/@arXiv_csLG_bot/113632934357523592
- Characterizing LLM Inference Energy-Performance Tradeoffs across Workloads and GPU Scaling
Paul Joe Maliakel, Shashikant Ilager, Ivona Brandic
https://arxiv.org/abs/2501.08219 https://mastoxiv.page/@arXiv_csLG_bot/113831081884570770
- Universality of Benign Overfitting in Binary Linear Classification
Ichiro Hashimoto, Stanislav Volgushev, Piotr Zwiernik
https://arxiv.org/abs/2501.10538 https://mastoxiv.page/@arXiv_csLG_bot/113872351652969955
- Safe Reinforcement Learning for Real-World Engine Control
Julian Bedei, Lucas Koch, Kevin Badalian, Alexander Winkler, Patrick Schaber, Jakob Andert
https://arxiv.org/abs/2501.16613 https://mastoxiv.page/@arXiv_csLG_bot/113910356206562660
- A Statistical Learning Perspective on Semi-dual Adversarial Neural Optimal Transport Solvers
Roman Tarasov, Petr Mokrov, Milena Gazdieva, Evgeny Burnaev, Alexander Korotin
https://arxiv.org/abs/2502.01310
- Improving the Convergence of Private Shuffled Gradient Methods with Public Data
Shuli Jiang, Pranay Sharma, Zhiwei Steven Wu, Gauri Joshi
https://arxiv.org/abs/2502.03652 https://mastoxiv.page/@arXiv_csLG_bot/113961314098841096
- Using the Path of Least Resistance to Explain Deep Networks
Sina Salek, Joseph Enguehard
https://arxiv.org/abs/2502.12108 https://mastoxiv.page/@arXiv_csLG_bot/114023706252106865
- Distributional Vision-Language Alignment by Cauchy-Schwarz Divergence
Wenzhe Yin, Zehao Xiao, Pan Zhou, Shujian Yu, Jiayi Shen, Jan-Jakob Sonke, Efstratios Gavves
https://arxiv.org/abs/2502.17028 https://mastoxiv.page/@arXiv_csLG_bot/114063477202397951
- Armijo Line-search Can Make (Stochastic) Gradient Descent Provably Faster
Sharan Vaswani, Reza Babanezhad
https://arxiv.org/abs/2503.00229 https://mastoxiv.page/@arXiv_csLG_bot/114103018985567633
- Semantic Parallelism: Redefining Efficient MoE Inference via Model-Data Co-Scheduling
Yan Li, Zhenyu Zhang, Zhengang Wang, Pengfei Chen, Pengfei Zheng
https://arxiv.org/abs/2503.04398 https://mastoxiv.page/@arXiv_csLG_bot/114120014622063602
- A Survey on Federated Fine-tuning of Large Language Models
Wu, Tian, Li, Sun, Tam, Zhou, Liao, Xiong, Guo, Li, Xu
https://arxiv.org/abs/2503.12016 https://mastoxiv.page/@arXiv_csLG_bot/114182234054681647
- Towards Trustworthy GUI Agents: A Survey
Yucheng Shi, Wenhao Yu, Jingyuan Huang, Wenlin Yao, Wenhu Chen, Ninghao Liu
https://arxiv.org/abs/2503.23434 https://mastoxiv.page/@arXiv_csLG_bot/114263024618476521
- CONTINA: Confidence Interval for Traffic Demand Prediction with Coverage Guarantee
Chao Yang, Xiannan Huang, Shuhan Qiu, Yan Cheng
https://arxiv.org/abs/2504.13961 https://mastoxiv.page/@arXiv_csLG_bot/114380404041503229
- Regularity and Stability Properties of Selective SSMs with Discontinuous Gating
Nikola Zubi\'c, Davide Scaramuzza
https://arxiv.org/abs/2505.11602 https://mastoxiv.page/@arXiv_csLG_bot/114538965060456498
- RECON: Robust symmetry discovery via Explicit Canonical Orientation Normalization
Alonso Urbano, David W. Romero, Max Zimmer, Sebastian Pokutta
https://arxiv.org/abs/2505.13289 https://mastoxiv.page/@arXiv_csLG_bot/114539124884913788
- RefLoRA: Refactored Low-Rank Adaptation for Efficient Fine-Tuning of Large Models
Yilang Zhang, Bingcong Li, Georgios B. Giannakis
https://arxiv.org/abs/2505.18877 https://mastoxiv.page/@arXiv_csLG_bot/114578778213033886
- SuperMAN: Interpretable and Expressive Networks over Temporally Sparse Heterogeneous Data
Bechler-Speicher, Zerio, Huri, Vestergaard, Gilad-Bachrach, Jess, Bhatt, Sazonovs
https://arxiv.org/abs/2505.19193 https://mastoxiv.page/@arXiv_csLG_bot/114578790124778172
toXiv_bot_toot

Tootfinder

Opt-in global Mastodon full text search. Join the index!