Minimizing smooth Kurdyka-{\L}ojasiewicz functions via generalized descent methods: Convergence rate and complexity
Masoud Ahookhosh, Susan Ghaderi, Alireza Kabgani, Morteza Rahimi
https://arxiv.org/abs/2511.10414 https://arxiv.org/pdf/2511.10414 https://arxiv.org/html/2511.10414
arXiv:2511.10414v1 Announce Type: new
Abstract: This paper addresses the generalized descent algorithm (DEAL) for minimizing smooth functions, which is analyzed under the Kurdyka-{\L}ojasiewicz (KL) inequality. In particular, the suggested algorithm guarantees a sufficient decrease by adapting to the cost function's geometry. We leverage the KL property to establish the global convergence, convergence rates, and complexity. A particular focus is placed on the linear convergence of generalized descent methods. We show that the constant step-size and Armijo line search strategies along a generalized descent direction satisfy our generalized descent condition. Additionally, for nonsmooth functions by leveraging the smoothing techniques such as forward-backward and high-order Moreau envelopes, we show that the boosted proximal gradient method (BPGA) and the boosted high-order proximal-point (BPPA) methods are also specific cases of DEAL, respectively. It is notable that if the order of the high-order proximal term is chosen in a certain way (depending on the KL exponent), then the sequence generated by BPPA converges linearly for an arbitrary KL exponent. Our preliminary numerical experiments on inverse problems and LASSO demonstrate the efficiency of the proposed methods, validating our theoretical findings.
toXiv_bot_toot
Polyharmonic Cascade
Yuriy N. Bakhvalov
https://arxiv.org/abs/2512.17671 https://arxiv.org/pdf/2512.17671 https://arxiv.org/html/2512.17671
arXiv:2512.17671v1 Announce Type: new
Abstract: This paper presents a deep machine learning architecture, the "polyharmonic cascade" -- a sequence of packages of polyharmonic splines, where each layer is rigorously derived from the theory of random functions and the principles of indifference. This makes it possible to approximate nonlinear functions of arbitrary complexity while preserving global smoothness and a probabilistic interpretation. For the polyharmonic cascade, a training method alternative to gradient descent is proposed: instead of directly optimizing the coefficients, one solves a single global linear system on each batch with respect to the function values at fixed "constellations" of nodes. This yields synchronized updates of all layers, preserves the probabilistic interpretation of individual layers and theoretical consistency with the original model, and scales well: all computations reduce to 2D matrix operations efficiently executed on a GPU. Fast learning without overfitting on MNIST is demonstrated.
toXiv_bot_toot
Global Convergence of Four-Layer Matrix Factorization under Random Initialization
Minrui Luo, Weihang Xu, Xiang Gao, Maryam Fazel, Simon Shaolei Du
https://arxiv.org/abs/2511.09925 https://arxiv.org/pdf/2511.09925 https://arxiv.org/html/2511.09925
arXiv:2511.09925v1 Announce Type: new
Abstract: Gradient descent dynamics on the deep matrix factorization problem is extensively studied as a simplified theoretical model for deep neural networks. Although the convergence theory for two-layer matrix factorization is well-established, no global convergence guarantee for general deep matrix factorization under random initialization has been established to date. To address this gap, we provide a polynomial-time global convergence guarantee for randomly initialized gradient descent on four-layer matrix factorization, given certain conditions on the target matrix and a standard balanced regularization term. Our analysis employs new techniques to show saddle-avoidance properties of gradient decent dynamics, and extends previous theories to characterize the change in eigenvalues of layer weights.
toXiv_bot_toot
Low-Discrepancy Set Post-Processing via Gradient Descent
Fran\c{c}ois Cl\'ement, Linhang Huang, Woorim Lee, Cole Smidt, Braeden Sodt, Xuan Zhang
https://arxiv.org/abs/2511.10496 https://arxiv.org/pdf/2511.10496 https://arxiv.org/html/2511.10496
arXiv:2511.10496v1 Announce Type: new
Abstract: The construction of low-discrepancy sets, used for uniform sampling and numerical integration, has recently seen great improvements based on optimization and machine learning techniques. However, these methods are computationally expensive, often requiring days of computation or access to GPU clusters. We show that simple gradient descent-based techniques allow for comparable results when starting with a reasonably uniform point set. Not only is this method much more efficient and accessible, but it can be applied as post-processing to any low-discrepancy set generation method for a variety of standard discrepancy measures.
toXiv_bot_toot
S-D-RSM: Stochastic Distributed Regularized Splitting Method for Large-Scale Convex Optimization Problems
Maoran Wang, Xingju Cai, Yongxin Chen
https://arxiv.org/abs/2511.10133 https://arxiv.org/pdf/2511.10133 https://arxiv.org/html/2511.10133
arXiv:2511.10133v1 Announce Type: new
Abstract: This paper investigates the problems large-scale distributed composite convex optimization, with motivations from a broad range of applications, including multi-agent systems, federated learning, smart grids, wireless sensor networks, compressed sensing, and so on. Stochastic gradient descent (SGD) and its variants are commonly employed to solve such problems. However, existing algorithms often rely on vanishing step sizes, strong convexity assumptions, or entail substantial computational overhead to ensure convergence or obtain favorable complexity. To bridge the gap between theory and practice, we integrate consensus optimization and operator splitting techniques (see Problem Reformulation) to develop a novel stochastic splitting algorithm, termed the \emph{stochastic distributed regularized splitting method} (S-D-RSM). In practice, S-D-RSM performs parallel updates of proximal mappings and gradient information for only a randomly selected subset of agents at each iteration. By introducing regularization terms, it effectively mitigates consensus discrepancies among distributed nodes. In contrast to conventional stochastic methods, our theoretical analysis establishes that S-D-RSM achieves global convergence without requiring diminishing step sizes or strong convexity assumptions. Furthermore, it achieves an iteration complexity of $\mathcal{O}(1/\epsilon)$ with respect to both the objective function value and the consensus error. Numerical experiments show that S-D-RSM achieves up to 2--3$\times$ speedup compared to state-of-the-art baselines, while maintaining comparable or better accuracy. These results not only validate the algorithm's theoretical guarantees but also demonstrate its effectiveness in practical tasks such as compressed sensing and empirical risk minimization.
toXiv_bot_toot