Tootfinder

Opt-in global Mastodon full text search. Join the index!

@arXiv_statML_bot@mastoxiv.page
2025-10-15 09:49:02

Statistical Guarantees for High-Dimensional Stochastic Gradient Descent
Jiaqi Li, Zhipeng Lou, Johannes Schmidt-Hieber, Wei Biao Wu
arxiv.org/abs/2510.12013

@arXiv_mathOC_bot@mastoxiv.page
2025-10-14 11:44:38

Adaptive Conditional Gradient Descent
Abbas Khademi, Antonio Silveti-Falls
arxiv.org/abs/2510.11440 arxiv.org/pdf/2510.11440

@arXiv_mathOC_bot@mastoxiv.page
2025-10-15 09:08:11

New Classes of Non-monotone Variational Inequality Problems Solvable via Proximal Gradient on Smooth Gap Functions
Lei Zhao, Daoli Zhu, Shuzhong Zhang
arxiv.org/abs/2510.12105

@arXiv_mathNA_bot@mastoxiv.page
2025-10-10 08:55:59

Stochastic Gradient Descent for Incomplete Tensor Linear Systems
Anna Ma, Deanna Needell, Alexander Xue
arxiv.org/abs/2510.07630 arxiv.org/…

@arXiv_csLG_bot@mastoxiv.page
2025-10-08 10:26:09

Correlating Cross-Iteration Noise for DP-SGD using Model Curvature
Xin Gu, Yingtai Xiao, Guanlin He, Jiamu Bai, Daniel Kifer, Kiwan Maeng
arxiv.org/abs/2510.05416

@arXiv_csGT_bot@mastoxiv.page
2025-10-07 07:46:28

On the $O(1/T)$ Convergence of Alternating Gradient Descent-Ascent in Bilinear Games
Tianlong Nan, Shuvomoy Das Gupta, Garud Iyengar, Christian Kroer
arxiv.org/abs/2510.03855

@arXiv_mathOC_bot@mastoxiv.page
2025-11-14 09:41:00

Minimizing smooth Kurdyka-{\L}ojasiewicz functions via generalized descent methods: Convergence rate and complexity
Masoud Ahookhosh, Susan Ghaderi, Alireza Kabgani, Morteza Rahimi
arxiv.org/abs/2511.10414 arxiv.org/pdf/2511.10414 arxiv.org/html/2511.10414
arXiv:2511.10414v1 Announce Type: new
Abstract: This paper addresses the generalized descent algorithm (DEAL) for minimizing smooth functions, which is analyzed under the Kurdyka-{\L}ojasiewicz (KL) inequality. In particular, the suggested algorithm guarantees a sufficient decrease by adapting to the cost function's geometry. We leverage the KL property to establish the global convergence, convergence rates, and complexity. A particular focus is placed on the linear convergence of generalized descent methods. We show that the constant step-size and Armijo line search strategies along a generalized descent direction satisfy our generalized descent condition. Additionally, for nonsmooth functions by leveraging the smoothing techniques such as forward-backward and high-order Moreau envelopes, we show that the boosted proximal gradient method (BPGA) and the boosted high-order proximal-point (BPPA) methods are also specific cases of DEAL, respectively. It is notable that if the order of the high-order proximal term is chosen in a certain way (depending on the KL exponent), then the sequence generated by BPPA converges linearly for an arbitrary KL exponent. Our preliminary numerical experiments on inverse problems and LASSO demonstrate the efficiency of the proposed methods, validating our theoretical findings.
toXiv_bot_toot

@arXiv_csSD_bot@mastoxiv.page
2025-10-13 08:23:10

Audible Networks: Deconstructing and Manipulating Sounds with Deep Non-Negative Autoencoders
Juan Jos\'e Burred, Carmine-Emanuele Cella
arxiv.org/abs/2510.08816

@arXiv_statML_bot@mastoxiv.page
2025-10-08 09:29:19

On the Theory of Continual Learning with Gradient Descent for Neural Networks
Hossein Taheri, Avishek Ghosh, Arya Mazumdar
arxiv.org/abs/2510.05573

@arXiv_csCG_bot@mastoxiv.page
2025-10-13 07:31:30

Randomized HyperSteiner: A Stochastic Delaunay Triangulation Heuristic for the Hyperbolic Steiner Minimal Tree
Aniss Aiman Medbouhi, Alejandro Garc\'ia-Castellanos, Giovanni Luca Marchetti, Daniel Pelt, Erik J Bekkers, Danica Kragic
arxiv.org/abs/2510.09328

@arXiv_mathOC_bot@mastoxiv.page
2025-11-14 09:19:00

Global Convergence of Four-Layer Matrix Factorization under Random Initialization
Minrui Luo, Weihang Xu, Xiang Gao, Maryam Fazel, Simon Shaolei Du
arxiv.org/abs/2511.09925 arxiv.org/pdf/2511.09925 arxiv.org/html/2511.09925
arXiv:2511.09925v1 Announce Type: new
Abstract: Gradient descent dynamics on the deep matrix factorization problem is extensively studied as a simplified theoretical model for deep neural networks. Although the convergence theory for two-layer matrix factorization is well-established, no global convergence guarantee for general deep matrix factorization under random initialization has been established to date. To address this gap, we provide a polynomial-time global convergence guarantee for randomly initialized gradient descent on four-layer matrix factorization, given certain conditions on the target matrix and a standard balanced regularization term. Our analysis employs new techniques to show saddle-avoidance properties of gradient decent dynamics, and extends previous theories to characterize the change in eigenvalues of layer weights.
toXiv_bot_toot

@arXiv_condmatstatmech_bot@mastoxiv.page
2025-09-22 08:23:41

Training thermodynamic computers by gradient descent
Stephen Whitelam
arxiv.org/abs/2509.15324 arxiv.org/pdf/2509.15324

@arXiv_mathOC_bot@mastoxiv.page
2025-11-14 10:01:50

Low-Discrepancy Set Post-Processing via Gradient Descent
Fran\c{c}ois Cl\'ement, Linhang Huang, Woorim Lee, Cole Smidt, Braeden Sodt, Xuan Zhang
arxiv.org/abs/2511.10496 arxiv.org/pdf/2511.10496 arxiv.org/html/2511.10496
arXiv:2511.10496v1 Announce Type: new
Abstract: The construction of low-discrepancy sets, used for uniform sampling and numerical integration, has recently seen great improvements based on optimization and machine learning techniques. However, these methods are computationally expensive, often requiring days of computation or access to GPU clusters. We show that simple gradient descent-based techniques allow for comparable results when starting with a reasonably uniform point set. Not only is this method much more efficient and accessible, but it can be applied as post-processing to any low-discrepancy set generation method for a variety of standard discrepancy measures.
toXiv_bot_toot

@arXiv_mathST_bot@mastoxiv.page
2025-09-30 08:57:31

Learning single index model with gradient descent: spectral initialization and precise asymptotics
Yuchen Chen, Yandi Shen
arxiv.org/abs/2509.23527

@arXiv_statML_bot@mastoxiv.page
2025-10-03 09:42:31

Adaptive Kernel Selection for Stein Variational Gradient Descent
Moritz Melcher, Simon Weissmann, Ashia C. Wilson, Jakob Zech
arxiv.org/abs/2510.02067

@arXiv_csCE_bot@mastoxiv.page
2025-10-07 07:47:18

Towards Fast Option Pricing PDE Solvers Powered by PIELM
Akshay Govind Srinivasan, Anuj Jagannath Said, Sathwik Pentela, Vikas Dwivedi, Balaji Srinivasan
arxiv.org/abs/2510.04322

@arXiv_statME_bot@mastoxiv.page
2025-09-29 09:11:27

Federated Learning of Quantile Inference under Local Differential Privacy
Leheng Cai, Qirui Hu, Shuyuan Wu
arxiv.org/abs/2509.21800 arxiv.o…

@arXiv_csLG_bot@mastoxiv.page
2025-10-03 11:04:01

Equilibrium Matching: Generative Modeling with Implicit Energy-Based Models
Runqian Wang, Yilun Du
arxiv.org/abs/2510.02300 arxiv.org/pdf/2…

@arXiv_csLO_bot@mastoxiv.page
2025-09-25 12:40:17

Replaced article(s) found for cs.LO. arxiv.org/list/cs.LO/new
[1/1]:
- Compact Rule-Based Classifier Learning via Gradient Descent
Javier Fumanal-Idocin, Raquel Fernandez-Peralta, Javier Andreu-Perez

@arXiv_mathOC_bot@mastoxiv.page
2025-10-06 09:27:49

Quantitative Convergence Analysis of Projected Stochastic Gradient Descent for Non-Convex Losses via the Goldstein Subdifferential
Yuping Zheng, Andrew Lamperski
arxiv.org/abs/2510.02735

@arXiv_statML_bot@mastoxiv.page
2025-09-30 08:48:11

Differentially Private Two-Stage Gradient Descent for Instrumental Variable Regression
Haodong Liang, Yanhao Jin, Krishnakumar Balasubramanian, Lifeng Lai
arxiv.org/abs/2509.22794

@arXiv_mathOC_bot@mastoxiv.page
2025-11-14 09:37:10

S-D-RSM: Stochastic Distributed Regularized Splitting Method for Large-Scale Convex Optimization Problems
Maoran Wang, Xingju Cai, Yongxin Chen
arxiv.org/abs/2511.10133 arxiv.org/pdf/2511.10133 arxiv.org/html/2511.10133
arXiv:2511.10133v1 Announce Type: new
Abstract: This paper investigates the problems large-scale distributed composite convex optimization, with motivations from a broad range of applications, including multi-agent systems, federated learning, smart grids, wireless sensor networks, compressed sensing, and so on. Stochastic gradient descent (SGD) and its variants are commonly employed to solve such problems. However, existing algorithms often rely on vanishing step sizes, strong convexity assumptions, or entail substantial computational overhead to ensure convergence or obtain favorable complexity. To bridge the gap between theory and practice, we integrate consensus optimization and operator splitting techniques (see Problem Reformulation) to develop a novel stochastic splitting algorithm, termed the \emph{stochastic distributed regularized splitting method} (S-D-RSM). In practice, S-D-RSM performs parallel updates of proximal mappings and gradient information for only a randomly selected subset of agents at each iteration. By introducing regularization terms, it effectively mitigates consensus discrepancies among distributed nodes. In contrast to conventional stochastic methods, our theoretical analysis establishes that S-D-RSM achieves global convergence without requiring diminishing step sizes or strong convexity assumptions. Furthermore, it achieves an iteration complexity of $\mathcal{O}(1/\epsilon)$ with respect to both the objective function value and the consensus error. Numerical experiments show that S-D-RSM achieves up to 2--3$\times$ speedup compared to state-of-the-art baselines, while maintaining comparable or better accuracy. These results not only validate the algorithm's theoretical guarantees but also demonstrate its effectiveness in practical tasks such as compressed sensing and empirical risk minimization.
toXiv_bot_toot

@arXiv_csAI_bot@mastoxiv.page
2025-09-18 09:08:41

From Next Token Prediction to (STRIPS) World Models -- Preliminary Results
Carlos N\'u\~nez-Molina, Vicen\c{c} G\'omez, Hector Geffner
arxiv.org/abs/2509.13389

@arXiv_csLG_bot@mastoxiv.page
2025-09-18 10:15:41

A Universal Banach--Bregman Framework for Stochastic Iterations: Unifying Stochastic Mirror Descent, Learning and LLM Training
Johnny R. Zhang (Independent Researcher), Xiaomei Mi (University of Manchester), Gaoyuan Du (Amazon), Qianyi Sun (Microsoft), Shiqi Wang (Meta), Jiaxuan Li (Amazon), Wenhua Zhou (Independent Researcher)
arx…

@arXiv_csCE_bot@mastoxiv.page
2025-10-03 11:05:23

Crosslisted article(s) found for cs.CE. arxiv.org/list/cs.CE/new
[1/1]:
- Fast training of accurate physics-informed neural networks without gradient descent
Datar, Kapoor, Chandra, Sun, Bolager, Burak, Veselovska, Fornasier, Dietrich

@arXiv_csLG_bot@mastoxiv.page
2025-09-22 10:33:31

DIVEBATCH: Accelerating Model Training Through Gradient-Diversity Aware Batch Size Adaptation
Yuen Chen, Yian Wang, Hari Sundaram
arxiv.org/abs/2509.16173

@arXiv_mathST_bot@mastoxiv.page
2025-09-30 16:50:46

Crosslisted article(s) found for math.ST. arxiv.org/list/math.ST/new
[1/1]:
- Differentially Private Two-Stage Gradient Descent for Instrumental Variable Regression
Haodong Liang, Yanhao Jin, Krishnakumar Balasubramanian, Lifeng Lai

@arXiv_statML_bot@mastoxiv.page
2025-10-06 12:39:12

Replaced article(s) found for stat.ML. arxiv.org/list/stat.ML/new
[2/2]:
- Gradient Descent with Large Step Sizes: Chaos and Fractal Convergence Region
Shuang Liang, Guido Mont\'ufar

@arXiv_mathOC_bot@mastoxiv.page
2025-09-18 09:06:51

Accelerated Gradient Methods with Biased Gradient Estimates: Risk Sensitivity, High-Probability Guarantees, and Large Deviation Bounds
Mert G\"urb\"uzbalaban, Yasa Syed, Necdet Serhat Aybat
arxiv.org/abs/2509.13628

@arXiv_csSD_bot@mastoxiv.page
2025-09-22 10:01:11

Reverse Engineering of Music Mixing Graphs with Differentiable Processors and Iterative Pruning
Sungho Lee, Marco Mart\'inez-Ram\'irez, Wei-Hsiang Liao, Stefan Uhlich, Giorgio Fabbro, Kyogu Lee, Yuki Mitsufuji
arxiv.org/abs/2509.15948

@arXiv_statML_bot@mastoxiv.page
2025-09-29 08:58:37

Effective continuous equations for adaptive SGD: a stochastic analysis view
Luca Callisti, Marco Romito, Francesco Triggiano
arxiv.org/abs/2509.21614

@arXiv_statML_bot@mastoxiv.page
2025-09-23 09:37:40

Risk Comparisons in Linear Regression: Implicit Regularization Dominates Explicit Regularization
Jingfeng Wu, Peter L. Bartlett, Jason D. Lee, Sham M. Kakade, Bin Yu
arxiv.org/abs/2509.17251

@arXiv_mathST_bot@mastoxiv.page
2025-09-30 10:18:21

Autoregressive Processes on Stiefel and Grassmann Manifolds
Jordi-Llu\'is Figueras, Aron Persson
arxiv.org/abs/2509.24767 arxiv.org/pdf…

@arXiv_mathOC_bot@mastoxiv.page
2025-10-07 10:41:52

A Frank-Wolfe Algorithm for Strongly Monotone Variational Inequalities
Reza Rahimi Baghbadorani, Peyman Mohajerin Esfahani, Sergio Grammatico
arxiv.org/abs/2510.03842

@arXiv_mathOC_bot@mastoxiv.page
2025-10-07 10:40:42

Learning Polynomial Activation Functions for Deep Neural Networks
Linghao Zhang, Jiawang Nie, Tingting Tang
arxiv.org/abs/2510.03682 arxiv.…

@arXiv_statML_bot@mastoxiv.page
2025-10-02 09:02:31

Guaranteed Noisy CP Tensor Recovery via Riemannian Optimization on the Segre Manifold
Ke Xu, Yuefeng Han
arxiv.org/abs/2510.00569 arxiv.org…

@arXiv_statML_bot@mastoxiv.page
2025-09-30 11:12:51

Quantitative convergence of trained single layer neural networks to Gaussian processes
Eloy Mosig, Andrea Agazzi, Dario Trevisan
arxiv.org/abs/2509.24544

@arXiv_mathOC_bot@mastoxiv.page
2025-09-29 15:15:58

Replaced article(s) found for math.OC. arxiv.org/list/math.OC/new
[1/1]:
- The Limit Points of (Optimistic) Gradient Descent in Min-Max Optimization
Constantinos Daskalakis, Ioannis Panageas

@arXiv_mathOC_bot@mastoxiv.page
2025-09-29 08:23:57

Regularized Overestimated Newton
Danny Duan, Hanbaek Lyu
arxiv.org/abs/2509.21684 arxiv.org/pdf/2509.21684

@arXiv_mathOC_bot@mastoxiv.page
2025-09-29 08:22:57

A regret minimization approach to fixed-point iterations
Joon Kwon
arxiv.org/abs/2509.21653 arxiv.org/pdf/2509.21653

@arXiv_mathOC_bot@mastoxiv.page
2025-09-23 10:17:00

Deep Learning as the Disciplined Construction of Tame Objects
Gilles Bareilles, Allen Gehret, Johannes Aspman, Jana Lep\v{s}ov\'a, Jakub Mare\v{c}ek
arxiv.org/abs/2509.18025