2025-10-15 09:49:02
Statistical Guarantees for High-Dimensional Stochastic Gradient Descent
Jiaqi Li, Zhipeng Lou, Johannes Schmidt-Hieber, Wei Biao Wu
https://arxiv.org/abs/2510.12013 https://
Statistical Guarantees for High-Dimensional Stochastic Gradient Descent
Jiaqi Li, Zhipeng Lou, Johannes Schmidt-Hieber, Wei Biao Wu
https://arxiv.org/abs/2510.12013 https://
Adaptive Conditional Gradient Descent
Abbas Khademi, Antonio Silveti-Falls
https://arxiv.org/abs/2510.11440 https://arxiv.org/pdf/2510.11440
New Classes of Non-monotone Variational Inequality Problems Solvable via Proximal Gradient on Smooth Gap Functions
Lei Zhao, Daoli Zhu, Shuzhong Zhang
https://arxiv.org/abs/2510.12105
Stochastic Gradient Descent for Incomplete Tensor Linear Systems
Anna Ma, Deanna Needell, Alexander Xue
https://arxiv.org/abs/2510.07630 https://arxiv.org/…
Correlating Cross-Iteration Noise for DP-SGD using Model Curvature
Xin Gu, Yingtai Xiao, Guanlin He, Jiamu Bai, Daniel Kifer, Kiwan Maeng
https://arxiv.org/abs/2510.05416 https:…
On the $O(1/T)$ Convergence of Alternating Gradient Descent-Ascent in Bilinear Games
Tianlong Nan, Shuvomoy Das Gupta, Garud Iyengar, Christian Kroer
https://arxiv.org/abs/2510.03855
Minimizing smooth Kurdyka-{\L}ojasiewicz functions via generalized descent methods: Convergence rate and complexity
Masoud Ahookhosh, Susan Ghaderi, Alireza Kabgani, Morteza Rahimi
https://arxiv.org/abs/2511.10414 https://arxiv.org/pdf/2511.10414 https://arxiv.org/html/2511.10414
arXiv:2511.10414v1 Announce Type: new
Abstract: This paper addresses the generalized descent algorithm (DEAL) for minimizing smooth functions, which is analyzed under the Kurdyka-{\L}ojasiewicz (KL) inequality. In particular, the suggested algorithm guarantees a sufficient decrease by adapting to the cost function's geometry. We leverage the KL property to establish the global convergence, convergence rates, and complexity. A particular focus is placed on the linear convergence of generalized descent methods. We show that the constant step-size and Armijo line search strategies along a generalized descent direction satisfy our generalized descent condition. Additionally, for nonsmooth functions by leveraging the smoothing techniques such as forward-backward and high-order Moreau envelopes, we show that the boosted proximal gradient method (BPGA) and the boosted high-order proximal-point (BPPA) methods are also specific cases of DEAL, respectively. It is notable that if the order of the high-order proximal term is chosen in a certain way (depending on the KL exponent), then the sequence generated by BPPA converges linearly for an arbitrary KL exponent. Our preliminary numerical experiments on inverse problems and LASSO demonstrate the efficiency of the proposed methods, validating our theoretical findings.
toXiv_bot_toot
Audible Networks: Deconstructing and Manipulating Sounds with Deep Non-Negative Autoencoders
Juan Jos\'e Burred, Carmine-Emanuele Cella
https://arxiv.org/abs/2510.08816 http…
On the Theory of Continual Learning with Gradient Descent for Neural Networks
Hossein Taheri, Avishek Ghosh, Arya Mazumdar
https://arxiv.org/abs/2510.05573 https://
Randomized HyperSteiner: A Stochastic Delaunay Triangulation Heuristic for the Hyperbolic Steiner Minimal Tree
Aniss Aiman Medbouhi, Alejandro Garc\'ia-Castellanos, Giovanni Luca Marchetti, Daniel Pelt, Erik J Bekkers, Danica Kragic
https://arxiv.org/abs/2510.09328
Global Convergence of Four-Layer Matrix Factorization under Random Initialization
Minrui Luo, Weihang Xu, Xiang Gao, Maryam Fazel, Simon Shaolei Du
https://arxiv.org/abs/2511.09925 https://arxiv.org/pdf/2511.09925 https://arxiv.org/html/2511.09925
arXiv:2511.09925v1 Announce Type: new
Abstract: Gradient descent dynamics on the deep matrix factorization problem is extensively studied as a simplified theoretical model for deep neural networks. Although the convergence theory for two-layer matrix factorization is well-established, no global convergence guarantee for general deep matrix factorization under random initialization has been established to date. To address this gap, we provide a polynomial-time global convergence guarantee for randomly initialized gradient descent on four-layer matrix factorization, given certain conditions on the target matrix and a standard balanced regularization term. Our analysis employs new techniques to show saddle-avoidance properties of gradient decent dynamics, and extends previous theories to characterize the change in eigenvalues of layer weights.
toXiv_bot_toot
Training thermodynamic computers by gradient descent
Stephen Whitelam
https://arxiv.org/abs/2509.15324 https://arxiv.org/pdf/2509.15324
Low-Discrepancy Set Post-Processing via Gradient Descent
Fran\c{c}ois Cl\'ement, Linhang Huang, Woorim Lee, Cole Smidt, Braeden Sodt, Xuan Zhang
https://arxiv.org/abs/2511.10496 https://arxiv.org/pdf/2511.10496 https://arxiv.org/html/2511.10496
arXiv:2511.10496v1 Announce Type: new
Abstract: The construction of low-discrepancy sets, used for uniform sampling and numerical integration, has recently seen great improvements based on optimization and machine learning techniques. However, these methods are computationally expensive, often requiring days of computation or access to GPU clusters. We show that simple gradient descent-based techniques allow for comparable results when starting with a reasonably uniform point set. Not only is this method much more efficient and accessible, but it can be applied as post-processing to any low-discrepancy set generation method for a variety of standard discrepancy measures.
toXiv_bot_toot
Learning single index model with gradient descent: spectral initialization and precise asymptotics
Yuchen Chen, Yandi Shen
https://arxiv.org/abs/2509.23527 https://
Adaptive Kernel Selection for Stein Variational Gradient Descent
Moritz Melcher, Simon Weissmann, Ashia C. Wilson, Jakob Zech
https://arxiv.org/abs/2510.02067 https://
Towards Fast Option Pricing PDE Solvers Powered by PIELM
Akshay Govind Srinivasan, Anuj Jagannath Said, Sathwik Pentela, Vikas Dwivedi, Balaji Srinivasan
https://arxiv.org/abs/2510.04322
Federated Learning of Quantile Inference under Local Differential Privacy
Leheng Cai, Qirui Hu, Shuyuan Wu
https://arxiv.org/abs/2509.21800 https://arxiv.o…
Equilibrium Matching: Generative Modeling with Implicit Energy-Based Models
Runqian Wang, Yilun Du
https://arxiv.org/abs/2510.02300 https://arxiv.org/pdf/2…
Replaced article(s) found for cs.LO. https://arxiv.org/list/cs.LO/new
[1/1]:
- Compact Rule-Based Classifier Learning via Gradient Descent
Javier Fumanal-Idocin, Raquel Fernandez-Peralta, Javier Andreu-Perez
Quantitative Convergence Analysis of Projected Stochastic Gradient Descent for Non-Convex Losses via the Goldstein Subdifferential
Yuping Zheng, Andrew Lamperski
https://arxiv.org/abs/2510.02735
Differentially Private Two-Stage Gradient Descent for Instrumental Variable Regression
Haodong Liang, Yanhao Jin, Krishnakumar Balasubramanian, Lifeng Lai
https://arxiv.org/abs/2509.22794
S-D-RSM: Stochastic Distributed Regularized Splitting Method for Large-Scale Convex Optimization Problems
Maoran Wang, Xingju Cai, Yongxin Chen
https://arxiv.org/abs/2511.10133 https://arxiv.org/pdf/2511.10133 https://arxiv.org/html/2511.10133
arXiv:2511.10133v1 Announce Type: new
Abstract: This paper investigates the problems large-scale distributed composite convex optimization, with motivations from a broad range of applications, including multi-agent systems, federated learning, smart grids, wireless sensor networks, compressed sensing, and so on. Stochastic gradient descent (SGD) and its variants are commonly employed to solve such problems. However, existing algorithms often rely on vanishing step sizes, strong convexity assumptions, or entail substantial computational overhead to ensure convergence or obtain favorable complexity. To bridge the gap between theory and practice, we integrate consensus optimization and operator splitting techniques (see Problem Reformulation) to develop a novel stochastic splitting algorithm, termed the \emph{stochastic distributed regularized splitting method} (S-D-RSM). In practice, S-D-RSM performs parallel updates of proximal mappings and gradient information for only a randomly selected subset of agents at each iteration. By introducing regularization terms, it effectively mitigates consensus discrepancies among distributed nodes. In contrast to conventional stochastic methods, our theoretical analysis establishes that S-D-RSM achieves global convergence without requiring diminishing step sizes or strong convexity assumptions. Furthermore, it achieves an iteration complexity of $\mathcal{O}(1/\epsilon)$ with respect to both the objective function value and the consensus error. Numerical experiments show that S-D-RSM achieves up to 2--3$\times$ speedup compared to state-of-the-art baselines, while maintaining comparable or better accuracy. These results not only validate the algorithm's theoretical guarantees but also demonstrate its effectiveness in practical tasks such as compressed sensing and empirical risk minimization.
toXiv_bot_toot
From Next Token Prediction to (STRIPS) World Models -- Preliminary Results
Carlos N\'u\~nez-Molina, Vicen\c{c} G\'omez, Hector Geffner
https://arxiv.org/abs/2509.13389 h…
A Universal Banach--Bregman Framework for Stochastic Iterations: Unifying Stochastic Mirror Descent, Learning and LLM Training
Johnny R. Zhang (Independent Researcher), Xiaomei Mi (University of Manchester), Gaoyuan Du (Amazon), Qianyi Sun (Microsoft), Shiqi Wang (Meta), Jiaxuan Li (Amazon), Wenhua Zhou (Independent Researcher)
https://arx…
Crosslisted article(s) found for cs.CE. https://arxiv.org/list/cs.CE/new
[1/1]:
- Fast training of accurate physics-informed neural networks without gradient descent
Datar, Kapoor, Chandra, Sun, Bolager, Burak, Veselovska, Fornasier, Dietrich
DIVEBATCH: Accelerating Model Training Through Gradient-Diversity Aware Batch Size Adaptation
Yuen Chen, Yian Wang, Hari Sundaram
https://arxiv.org/abs/2509.16173 https://
Crosslisted article(s) found for math.ST. https://arxiv.org/list/math.ST/new
[1/1]:
- Differentially Private Two-Stage Gradient Descent for Instrumental Variable Regression
Haodong Liang, Yanhao Jin, Krishnakumar Balasubramanian, Lifeng Lai
Replaced article(s) found for stat.ML. https://arxiv.org/list/stat.ML/new
[2/2]:
- Gradient Descent with Large Step Sizes: Chaos and Fractal Convergence Region
Shuang Liang, Guido Mont\'ufar
Accelerated Gradient Methods with Biased Gradient Estimates: Risk Sensitivity, High-Probability Guarantees, and Large Deviation Bounds
Mert G\"urb\"uzbalaban, Yasa Syed, Necdet Serhat Aybat
https://arxiv.org/abs/2509.13628
Reverse Engineering of Music Mixing Graphs with Differentiable Processors and Iterative Pruning
Sungho Lee, Marco Mart\'inez-Ram\'irez, Wei-Hsiang Liao, Stefan Uhlich, Giorgio Fabbro, Kyogu Lee, Yuki Mitsufuji
https://arxiv.org/abs/2509.15948
Effective continuous equations for adaptive SGD: a stochastic analysis view
Luca Callisti, Marco Romito, Francesco Triggiano
https://arxiv.org/abs/2509.21614 https://
Risk Comparisons in Linear Regression: Implicit Regularization Dominates Explicit Regularization
Jingfeng Wu, Peter L. Bartlett, Jason D. Lee, Sham M. Kakade, Bin Yu
https://arxiv.org/abs/2509.17251
Autoregressive Processes on Stiefel and Grassmann Manifolds
Jordi-Llu\'is Figueras, Aron Persson
https://arxiv.org/abs/2509.24767 https://arxiv.org/pdf…
A Frank-Wolfe Algorithm for Strongly Monotone Variational Inequalities
Reza Rahimi Baghbadorani, Peyman Mohajerin Esfahani, Sergio Grammatico
https://arxiv.org/abs/2510.03842 ht…
Learning Polynomial Activation Functions for Deep Neural Networks
Linghao Zhang, Jiawang Nie, Tingting Tang
https://arxiv.org/abs/2510.03682 https://arxiv.…
Guaranteed Noisy CP Tensor Recovery via Riemannian Optimization on the Segre Manifold
Ke Xu, Yuefeng Han
https://arxiv.org/abs/2510.00569 https://arxiv.org…
Quantitative convergence of trained single layer neural networks to Gaussian processes
Eloy Mosig, Andrea Agazzi, Dario Trevisan
https://arxiv.org/abs/2509.24544 https://…
Replaced article(s) found for math.OC. https://arxiv.org/list/math.OC/new
[1/1]:
- The Limit Points of (Optimistic) Gradient Descent in Min-Max Optimization
Constantinos Daskalakis, Ioannis Panageas
Regularized Overestimated Newton
Danny Duan, Hanbaek Lyu
https://arxiv.org/abs/2509.21684 https://arxiv.org/pdf/2509.21684…
A regret minimization approach to fixed-point iterations
Joon Kwon
https://arxiv.org/abs/2509.21653 https://arxiv.org/pdf/2509.21653
Deep Learning as the Disciplined Construction of Tame Objects
Gilles Bareilles, Allen Gehret, Johannes Aspman, Jana Lep\v{s}ov\'a, Jakub Mare\v{c}ek
https://arxiv.org/abs/2509.18025