Tootfinder

@arXiv_csAI_bot@mastoxiv.page
2025-09-16 10:33:07

MAPGD: Multi-Agent Prompt Gradient Descent for Collaborative Prompt Optimization
Yichen Han, Bojun Liu, Zhengpeng zhou, Guanyu Liu, Zeng Zhang, Yang Yang, Wenli Wang, Isaac N Shi, Yunyan, Lewei He, Tianyu Shi
https://arxiv.org/abs/2509.11361

MAPGD: Multi-Agent Prompt Gradient Descent for Collaborative Prompt Optimization
Prompt engineering is crucial for leveraging large language models (LLMs), but existing methods often rely on a single optimization trajectory, limiting adaptability and efficiency while suffering from narrow perspectives, gradient conflicts, and high computational cost. We propose MAPGD (Multi-Agent Prompt Gradient Descent), a framework integrating multi-agent collaboration with gradient-based optimization. MAPGD features specialized agents for task clarity, example selection, format design, a…

@arXiv_statML_bot@mastoxiv.page
2025-10-15 09:49:02

Statistical Guarantees for High-Dimensional Stochastic Gradient Descent
Jiaqi Li, Zhipeng Lou, Johannes Schmidt-Hieber, Wei Biao Wu
https://arxiv.org/abs/2510.12013 https://

Statistical Guarantees for High-Dimensional Stochastic Gradient Descent
Stochastic Gradient Descent (SGD) and its Ruppert-Polyak averaged variant (ASGD) lie at the heart of modern large-scale learning, yet their theoretical properties in high-dimensional settings are rarely understood. In this paper, we provide rigorous statistical guarantees for constant learning-rate SGD and ASGD in high-dimensional regimes. Our key innovation is to transfer powerful tools from high-dimensional time series to online learning. Specifically, by viewing SGD as a nonlinear autoregres…

@arXiv_mathOC_bot@mastoxiv.page
2025-07-16 08:41:31

Non-smooth stochastic gradient descent using smoothing functions
Tommaso Giovannelli, Jingfu Tan, Luis Nunes Vicente
https://arxiv.org/abs/2507.10901 https…

Non-smooth stochastic gradient descent using smoothing functions
In this paper, we address stochastic optimization problems involving a composition of a non-smooth outer function and a smooth inner function, a formulation frequently encountered in machine learning and operations research. To deal with the non-differentiability of the outer function, we approximate the original non-smooth function using smoothing functions, which are continuously differentiable and approach the original function as a smoothing parameter goes to zero (at the price of increasin…

@arXiv_csGT_bot@mastoxiv.page
2025-07-16 08:00:21

A Parallelizable Approach for Characterizing NE in Zero-Sum Games After a Linear Number of Iterations of Gradient Descent
Taemin Kim, James P. Bailey
https://arxiv.org/abs/2507.11366

A Parallelizable Approach for Characterizing NE in Zero-Sum Games After a Linear Number of Iterations of Gradient Descent
We study online optimization methods for zero-sum games, a fundamental problem in adversarial learning in machine learning, economics, and many other domains. Traditional methods approximate Nash equilibria (NE) using either regret-based methods (time-average convergence) or contraction-map-based methods (last-iterate convergence). We propose a new method based on Hamiltonian dynamics in physics and prove that it can characterize the set of NE in a finite (linear) number of iterations of altern…

@arXiv_csLG_bot@mastoxiv.page
2025-09-15 09:57:11

Understanding Outer Optimizers in Local SGD: Learning Rates, Momentum, and Acceleration
Ahmed Khaled, Satyen Kale, Arthur Douillard, Chi Jin, Rob Fergus, Manzil Zaheer
https://arxiv.org/abs/2509.10439 …

Understanding Outer Optimizers in Local SGD: Learning Rates, Momentum, and Acceleration
Modern machine learning often requires training with large batch size, distributed data, and massively parallel compute hardware (like mobile and other edge devices or distributed data centers). Communication becomes a major bottleneck in such settings but methods like Local Stochastic Gradient Descent (Local SGD) show great promise in reducing this additional communication overhead. Local SGD consists of three parts: a local optimization process, an aggregation mechanism, and an outer optimize…

@arXiv_mathOC_bot@mastoxiv.page
2025-07-16 09:19:11

Deep Equilibrium models for Poisson Imaging Inverse problems via Mirror Descent
Christian Daniele, Silvia Villa, Samuel Vaiter, Luca Calatroni
https://arxiv.org/abs/2507.11461

Deep Equilibrium models for Poisson Imaging Inverse problems via Mirror Descent
Deep Equilibrium Models (DEQs) are implicit neural networks with fixed points, which have recently gained attention for learning image regularization functionals, particularly in settings involving Gaussian fidelities, where assumptions on the forward operator ensure contractiveness of standard (proximal) Gradient Descent operators. In this work, we extend the application of DEQs to Poisson inverse problems, where the data fidelity term is more appropriately modeled by the Kullback-Leibler dive…

@arXiv_quantph_bot@mastoxiv.page
2025-08-12 11:39:43

Calculating the Projective Norm of higher-order tensors using a gradient descent algorithm
Aaditya Rudra, Maria Anastasia Jivulescu
https://arxiv.org/abs/2508.07933 https://

Calculating the Projective Norm of higher-order tensors using a gradient descent algorithm
Projective Norms are a class of tensor norms that map on the input and output spaces. These norms are useful for providing a measure of entanglement. Calculating the projective norms is an NP-hard problem, which creates challenges in computing due to the complexity of the exponentially growing parameter space for higher-order tensors. We develop a novel gradient descent algorithm to estimate the projective norm of higher-order tensors. The algorithm guarantees convergence to a minimum nuclear r…

@arXiv_eessSP_bot@mastoxiv.page
2025-09-15 08:24:01

Locally Permuted Low Rank Column-wise Sensing
Ahmed Ali Abbasi, Namrata Vaswani
https://arxiv.org/abs/2509.09820 https://arxiv.org/pdf/2509.09820

Locally Permuted Low Rank Column-wise Sensing
We precisely formulate, and provide a solution for, the Low Rank Columnwise Sensing (LRCS) problem when some of the observed data is scrambled/permuted/unlabeled. This problem, which we refer to as permuted LRCS, lies at the intersection of two distinct topics of recent research: unlabeled sensing and low rank column-wise (matrix) sensing. We introduce a novel generalization of the recently developed Alternating Gradient Descent and Minimization (AltGDMin) algorithm to solve this problem. We al…

@arXiv_astrophIM_bot@mastoxiv.page
2025-09-15 08:37:41

A Differentiable Surrogate Model for the Generation of Radio Pulses from In-Ice Neutrino Interactions
Philipp Pilar, Martin Ravn, Christian Glaser, Niklas Wahlstr\"om
https://arxiv.org/abs/2509.10274

A Differentiable Surrogate Model for the Generation of Radio Pulses from In-Ice Neutrino Interactions
The planned IceCube-Gen2 radio neutrino detector at the South Pole will enhance the detection of cosmic ultra-high-energy neutrinos. It is crucial to utilize the available time until construction to optimize the detector design. A fully differentiable pipeline, from signal generation to detector response, would allow for the application of gradient descent techniques to explore the parameter space of the detector. In our work, we focus on the aspect of signal generation, and propose a modulariz…

@arXiv_mathOC_bot@mastoxiv.page
2025-10-14 11:44:38

Adaptive Conditional Gradient Descent
Abbas Khademi, Antonio Silveti-Falls
https://arxiv.org/abs/2510.11440 https://arxiv.org/pdf/2510.11440

Adaptive Conditional Gradient Descent
Selecting an effective step-size is a fundamental challenge in first-order optimization, especially for problems with non-Euclidean geometries. This paper presents a novel adaptive step-size strategy for optimization algorithms that rely on linear minimization oracles, as used in the Conditional Gradient or non-Euclidean Normalized Steepest Descent algorithms. Using a simple heuristic to estimate a local Lipschitz constant for the gradient, we can determine step-sizes that guarantee sufficient …

@arXiv_csLG_bot@mastoxiv.page
2025-09-11 10:10:33

Modified Loss of Momentum Gradient Descent: Fine-Grained Analysis
Matias D. Cattaneo, Boris Shigida
https://arxiv.org/abs/2509.08483 https://arxiv.org/pdf/…

Modified Loss of Momentum Gradient Descent: Fine-Grained Analysis
We analyze gradient descent with Polyak heavy-ball momentum (HB) whose fixed momentum parameter $β\in (0, 1)$ provides exponential decay of memory. Building on Kovachki and Stuart (2021), we prove that on an exponentially attractive invariant manifold the algorithm is exactly plain gradient descent with a modified loss, provided that the step size $h$ is small enough. Although the modified loss does not admit a closed-form expression, we describe it with arbitrary precision and prove global (f…

@arXiv_eessAS_bot@mastoxiv.page
2025-09-16 08:55:46

Sound Matching an Analogue Levelling Amplifier Using the Newton-Raphson Method
Chin-Yun Yu, Gy\"orgy Fazekas
https://arxiv.org/abs/2509.10706 https://…

Sound Matching an Analogue Levelling Amplifier Using the Newton-Raphson Method
Automatic differentiation through digital signal processing algorithms for virtual analogue modelling has recently gained popularity. These algorithms are typically more computationally efficient than black-box neural networks that rely on dense matrix multiplications. Due to their differentiable nature, they can be integrated with neural networks and jointly trained using gradient descent algorithms, resulting in more efficient systems. Furthermore, signal processing algorithms have significan…

@arXiv_mathNA_bot@mastoxiv.page
2025-10-10 08:55:59

Stochastic Gradient Descent for Incomplete Tensor Linear Systems
Anna Ma, Deanna Needell, Alexander Xue
https://arxiv.org/abs/2510.07630 https://arxiv.org/…

Stochastic Gradient Descent for Incomplete Tensor Linear Systems
Solving large tensor linear systems poses significant challenges due to the high volume of data stored, and it only becomes more challenging when some of the data is missing. Recently, Ma et al. showed that this problem can be tackled using a stochastic gradient descent-based method, assuming that the missing data follows a uniform missing pattern. We adapt the technique by modifying the update direction, showing that the method is applicable under other missing data models. We prove convergenc…

@arXiv_mathOC_bot@mastoxiv.page
2025-08-13 08:40:32

Decentralized Relaxed Smooth Optimization with Gradient Descent Methods
Zhanhong Jiang, Aditya Balu, Soumik Sarkar
https://arxiv.org/abs/2508.08413 https://

Decentralized Relaxed Smooth Optimization with Gradient Descent Methods
$L_0$-smoothness, which has been pivotal to advancing decentralized optimization theory, is often fairly restrictive for modern tasks like deep learning. The recent advent of relaxed $(L_0,L_1)$-smoothness condition enables improved convergence rates for gradient methods. Despite centralized advances, its decentralized extension remains unexplored and challenging. In this work, we propose the first general framework for decentralized gradient descent (DGD) under $(L_0,L_1)$-smoothness by introd…

@arXiv_csSD_bot@mastoxiv.page
2025-10-13 08:23:10

Audible Networks: Deconstructing and Manipulating Sounds with Deep Non-Negative Autoencoders
Juan Jos\'e Burred, Carmine-Emanuele Cella
https://arxiv.org/abs/2510.08816 http…

Audible Networks: Deconstructing and Manipulating Sounds with Deep Non-Negative Autoencoders
We propose the use of Non-Negative Autoencoders (NAEs) for sound deconstruction and user-guided manipulation of sounds for creative purposes. NAEs offer a versatile and scalable extension of traditional Non-Negative Matrix Factorization (NMF)-based approaches for interpretable audio decomposition. By enforcing non-negativity constraints through projected gradient descent, we obtain decompositions where internal weights and activations can be directly interpreted as spectral shapes and temporal …

@arXiv_statML_bot@mastoxiv.page
2025-08-13 09:34:32

Bio-Inspired Artificial Neural Networks based on Predictive Coding
Davide Casnici, Charlotte Frenkel, Justin Dauwels
https://arxiv.org/abs/2508.08762 https://

Bio-Inspired Artificial Neural Networks based on Predictive Coding
Backpropagation (BP) of errors is the backbone training algorithm for artificial neural networks (ANNs). It updates network weights through gradient descent to minimize a loss function representing the mismatch between predictions and desired outputs. BP uses the chain rule to propagate the loss gradient backward through the network hierarchy, allowing efficient weight updates. However, this process requires weight updates at every layer to rely on a global error signal generated at the network…

@arXiv_csLG_bot@mastoxiv.page
2025-08-12 12:08:03

Online Convex Optimization with Heavy Tails: Old Algorithms, New Regrets, and Applications
Zijian Liu
https://arxiv.org/abs/2508.07473 https://arxiv.org/pd…

Online Convex Optimization with Heavy Tails: Old Algorithms, New Regrets, and Applications
In Online Convex Optimization (OCO), when the stochastic gradient has a finite variance, many algorithms provably work and guarantee a sublinear regret. However, limited results are known if the gradient estimate has a heavy tail, i.e., the stochastic gradient only admits a finite $\mathsf{p}$-th central moment for some $\mathsf{p}\in\left(1,2\right]$. Motivated by it, this work examines different old algorithms for OCO (e.g., Online Gradient Descent) in the more challenging heavy-tailed settin…

@arXiv_quantph_bot@mastoxiv.page
2025-08-14 09:14:52

Discovery of energy landscapes towards optimized quantum transport: Environmental effects and long-range tunneling
Maggie Lawrence, Matthew Pocrnic, Erin Fung, Juan Carrasquilla, Erik M. Gauger, Dvira Segal
https://arxiv.org/abs/2508.09371

Discovery of energy landscapes towards optimized quantum transport: Environmental effects and long-range tunneling
Carrier transport in quantum networks is governed by a variety of factors, including network dimensionality and connectivity, on-site energies, couplings between sites and whether they are short- or long-range, and environmental effects. In this work, we identify classes of quasi-one-dimensional chains with energy profiles that optimize carrier transport under such influences. Specifically, we optimize on-site energies using Optax's optimistic gradient descent and AdaMax algorithms, enabled by …

@arXiv_csCG_bot@mastoxiv.page
2025-10-13 07:31:30

Randomized HyperSteiner: A Stochastic Delaunay Triangulation Heuristic for the Hyperbolic Steiner Minimal Tree
Aniss Aiman Medbouhi, Alejandro Garc\'ia-Castellanos, Giovanni Luca Marchetti, Daniel Pelt, Erik J Bekkers, Danica Kragic
https://arxiv.org/abs/2510.09328

Randomized HyperSteiner: A Stochastic Delaunay Triangulation Heuristic for the Hyperbolic Steiner Minimal Tree
We study the problem of constructing Steiner Minimal Trees (SMTs) in hyperbolic space. Exact SMT computation is NP-hard, and existing hyperbolic heuristics such as HyperSteiner are deterministic and often get trapped in locally suboptimal configurations. We introduce Randomized HyperSteiner (RHS), a stochastic Delaunay triangulation heuristic that incorporates randomness into the expansion process and refines candidate trees via Riemannian gradient descent optimization. Experiments on synthetic…

@arXiv_csCR_bot@mastoxiv.page
2025-09-09 11:56:52

PLRV-O: Advancing Differentially Private Deep Learning via Privacy Loss Random Variable Optimization
Qin Yang, Nicholas Stout, Meisam Mohammady, Han Wang, Ayesha Samreen, Christopher J Quinn, Yan Yan, Ashish Kundu, Yuan Hong
https://arxiv.org/abs/2509.06264

PLRV-O: Advancing Differentially Private Deep Learning via Privacy Loss Random Variable Optimization
Differentially Private Stochastic Gradient Descent (DP-SGD) is a standard method for enforcing privacy in deep learning, typically using the Gaussian mechanism to perturb gradient updates. However, conventional mechanisms such as Gaussian and Laplacian noise are parameterized only by variance or scale. This single degree of freedom ties the magnitude of noise directly to both privacy loss and utility degradation, preventing independent control of these two factors. The problem becomes more pron…

@arXiv_mathOC_bot@mastoxiv.page
2025-08-13 09:42:02

Projected Gradient Descent for Constrained Decision-Dependent Optimization
Zifan Wang, Changxin Liu, Thomas Parisini, Michael M. Zavlanos, Karl H. Johansson
https://arxiv.org/abs/2508.08856

Projected Gradient Descent for Constrained Decision-Dependent Optimization
This paper considers the decision-dependent optimization problem, where the data distributions react in response to decisions affecting both the objective function and linear constraints. We propose a new method termed repeated projected gradient descent (RPGD), which iteratively projects points onto evolving feasible sets throughout the optimization process. To analyze the impact of varying projection sets, we show a Lipschitz continuity property of projections onto varying sets with an explic…

@arXiv_csCL_bot@mastoxiv.page
2025-09-09 12:06:12

MachineLearningLM: Continued Pretraining Language Models on Millions of Synthetic Tabular Prediction Tasks Scales In-Context ML
Haoyu Dong, Pengkun Zhang, Mingzhe Lu, Yanzhen Shen, Guolin Ke
https://arxiv.org/abs/2509.06806

MachineLearningLM: Continued Pretraining Language Models on Millions of Synthetic Tabular Prediction Tasks Scales In-Context ML
Large language models (LLMs) possess broad world knowledge and strong general-purpose reasoning ability, yet they struggle to learn from many in-context examples on standard machine learning (ML) tasks, that is, to leverage many-shot demonstrations purely via in-context learning (ICL) without gradient descent. We introduce MachineLearningLM, a portable continued-pretraining framework that equips a general-purpose LLM with robust in-context ML capability while preserving its general knowledge an…

@arXiv_csGT_bot@mastoxiv.page
2025-10-07 07:46:28

On the $O(1/T)$ Convergence of Alternating Gradient Descent-Ascent in Bilinear Games
Tianlong Nan, Shuvomoy Das Gupta, Garud Iyengar, Christian Kroer
https://arxiv.org/abs/2510.03855

On the $O(1/T)$ Convergence of Alternating Gradient Descent-Ascent in Bilinear Games
We study the alternating gradient descent-ascent (AltGDA) algorithm in two-player zero-sum games. Alternating methods, where players take turns to update their strategies, have long been recognized as simple and practical approaches for learning in games, exhibiting much better numerical performance than their simultaneous counterparts. However, our theoretical understanding of alternating algorithms remains limited, and results are mostly restricted to the unconstrained setting. We show that f…

@arXiv_csLG_bot@mastoxiv.page
2025-08-12 11:51:13

SGD Convergence under Stepsize Shrinkage in Low-Precision Training
Vincent-Daniel Yun
https://arxiv.org/abs/2508.07142 https://arxiv.org/pdf/2508.07142

SGD Convergence under Stepsize Shrinkage in Low-Precision Training
Low-precision training has become essential for reducing the computational and memory costs of large-scale deep learning. However, quantization of gradients introduces both magnitude shrinkage and additive noise, which can alter the convergence behavior of stochastic gradient descent (SGD). In this work, we study the convergence of SGD under a gradient shrinkage model, where each stochastic gradient is scaled by a factor $q_k \in (0,1]$ and perturbed by zero-mean quantization noise. We show tha…

@arXiv_statML_bot@mastoxiv.page
2025-10-08 09:29:19

On the Theory of Continual Learning with Gradient Descent for Neural Networks
Hossein Taheri, Avishek Ghosh, Arya Mazumdar
https://arxiv.org/abs/2510.05573 https://

On the Theory of Continual Learning with Gradient Descent for Neural Networks
Continual learning, the ability of a model to adapt to an ongoing sequence of tasks without forgetting the earlier ones, is a central goal of artificial intelligence. To shed light on its underlying mechanisms, we analyze the limitations of continual learning in a tractable yet representative setting. In particular, we study one-hidden-layer quadratic neural networks trained by gradient descent on an XOR cluster dataset with Gaussian noise, where different tasks correspond to different clusters…

@arXiv_mathOC_bot@mastoxiv.page
2025-08-12 11:12:53

Randomized coordinate gradient descent almost surely escapes strict saddle points
Ziang Chen, Yingzhou Li, Zihao Li
https://arxiv.org/abs/2508.07535 https://

Randomized coordinate gradient descent almost surely escapes strict saddle points
We analyze the behavior of randomized coordinate gradient descent for nonconvex optimization, proving that under standard assumptions, the iterates almost surely escape strict saddle points. By formulating the method as a nonlinear random dynamical system and characterizing neighborhoods of critical points, we establish this result through the center-stable manifold theorem.

@arXiv_mathNA_bot@mastoxiv.page
2025-08-08 08:03:22

Toroidal area-preserving parameterizations of genus-one closed surfaces
Marco Sutti, Mei-Heng Yueh
https://arxiv.org/abs/2508.05111 https://arxiv.org/pdf/2…

Toroidal area-preserving parameterizations of genus-one closed surfaces
We consider the problem of computing toroidal area-preserving parameterizations of genus-one closed surfaces. We propose four algorithms based on Riemannian geometry: the projected gradient descent method, the projected conjugate gradient method, the Riemannian gradient method, and the Riemannian conjugate gradient method. Our objective function is based on the stretch energy functional, and the minimization is constrained on a power manifold of ring tori embedded in three-dimensional Euclidean…

@arXiv_eessAS_bot@mastoxiv.page
2025-08-14 09:05:02

Objective Soups: Multilingual Multi-Task Modeling for Speech Processing
A F M Saif, Lisha Chen, Xiaodong Cui, Songtao Lu, Brian Kingsbury, Tianyi Chen
https://arxiv.org/abs/2508.09228

Objective Soups: Multilingual Multi-Task Modeling for Speech Processing
Training a single model for multilingual, multi-task speech processing (MSP) is severely hampered by conflicting objectives between tasks like speech recognition and translation. While multi-objective optimization (MOO) aims to align gradient updates, its effectiveness diminishes as the number of tasks grows, making it difficult to find a common descent direction. This raises a fundamental question: should highly conflicting objectives be optimized jointly or separated into a hierarchical struc…

@arXiv_csCE_bot@mastoxiv.page
2025-07-23 07:37:52

Multi-objective Portfolio Optimization Via Gradient Descent
Christian Oliva, Pedro R. Ventura, Luis F. Lago-Fern\'andez
https://arxiv.org/abs/2507.16717 https://

Multi-objective Portfolio Optimization Via Gradient Descent
Traditional approaches to portfolio optimization, often rooted in Modern Portfolio Theory and solved via quadratic programming or evolutionary algorithms, struggle with scalability or flexibility, especially in scenarios involving complex constraints, large datasets and/or multiple conflicting objectives. To address these challenges, we introduce a benchmark framework for multi-objective portfolio optimization (MPO) using gradient descent with automatic differentiation. Our method supports any …

@arXiv_condmatdisnn_bot@mastoxiv.page
2025-09-03 08:55:03

Phase diagram and eigenvalue dynamics of stochastic gradient descent in multilayer neural networks
Chanju Park (Swansea University), Biagio Lucini (Queen Mary University of London), Gert Aarts (Swansea University)
https://arxiv.org/abs/2509.01349

Phase diagram and eigenvalue dynamics of stochastic gradient descent in multilayer neural networks
Hyperparameter tuning is one of the essential steps to guarantee the convergence of machine learning models. We argue that intuition about the optimal choice of hyperparameters for stochastic gradient descent can be obtained by studying a neural network's phase diagram, in which each phase is characterised by distinctive dynamics of the singular values of weight matrices. Taking inspiration from disordered systems, we start from the observation that the loss landscape of a multilayer neural net…

@arXiv_condmatstatmech_bot@mastoxiv.page
2025-09-22 08:23:41

Training thermodynamic computers by gradient descent
Stephen Whitelam
https://arxiv.org/abs/2509.15324 https://arxiv.org/pdf/2509.15324

Training thermodynamic computers by gradient descent
We show how to adjust the parameters of a thermodynamic computer by gradient descent in order to perform a desired computation at a specified observation time. Within a digital simulation of a thermodynamic computer, training proceeds by maximizing the probability with which the computer would generate an idealized dynamical trajectory. The idealized trajectory is designed to reproduce the activations of a neural network trained to perform the desired computation. This teacher-student scheme re…

@arXiv_eessSP_bot@mastoxiv.page
2025-09-12 08:51:19

JFRFFNet: A Data-Model Co-Driven Graph Signal Denoising Model with Partial Prior Information
Ziqi Yan, Zhichao Zhang
https://arxiv.org/abs/2509.09147 https://

JFRFFNet: A Data-Model Co-Driven Graph Signal Denoising Model with Partial Prior Information
Wiener filtering in the joint time-vertex fractional Fourier transform (JFRFT) domain has shown high effectiveness in denoising time-varying graph signals. Traditional filtering models use grid search to determine the transform-order pair and compute filter coefficients, while learnable ones employ gradient-descent strategies to optimize them; both require complete prior information of graph signals. To overcome this shortcoming, this letter proposes a data-model co-driven denoising approach, t…

@arXiv_csIR_bot@mastoxiv.page
2025-08-07 09:14:53

Comparative Analysis of Novel NIRMAL Optimizer Against Adam and SGD with Momentum
Nirmal Gaud, Surej Mouli, Preeti Katiyar, Vaduguru Venkata Ramya
https://arxiv.org/abs/2508.04293

Comparative Analysis of Novel NIRMAL Optimizer Against Adam and SGD with Momentum
This study proposes NIRMAL (Novel Integrated Robust Multi-Adaptation Learning), a novel optimization algorithm that combines multiple strategies inspired by the movements of the chess piece. These strategies include gradient descent, momentum, stochastic perturbations, adaptive learning rates, and non-linear transformations. We carefully evaluated NIRMAL against two widely used and successful optimizers, Adam and SGD with Momentum, on four benchmark image classification datasets: MNIST, Fashion…

@arXiv_mathST_bot@mastoxiv.page
2025-09-30 08:57:31

Learning single index model with gradient descent: spectral initialization and precise asymptotics
Yuchen Chen, Yandi Shen
https://arxiv.org/abs/2509.23527 https://

Learning single index model with gradient descent: spectral initialization and precise asymptotics
Non-convex optimization plays a central role in many statistics and machine learning problems. Despite the landscape irregularities for general non-convex functions, some recent work showed that for many learning problems with random data and large enough sample size, there exists a region around the true signal with benign landscape. Motivated by this observation, a widely used strategy is a two-stage algorithm, where we first apply a spectral initialization to plunge into the region, and then…

@arXiv_mathOC_bot@mastoxiv.page
2025-08-12 10:19:53

On the Convergence of a Noisy Gradient Method for Non-convex Distributed Resource Allocation: Saddle Point Escape
Lei Qin, Ye Pu
https://arxiv.org/abs/2508.06922 https://…

On the Convergence of a Noisy Gradient Method for Non-convex Distributed Resource Allocation: Saddle Point Escape
This paper considers a class of distributed resource allocation problems where each agent privately holds a smooth, potentially non-convex local objective, subject to a globally coupled equality constraint. Built upon the existing method, Laplacian-weighted Gradient Descent, we propose to add random perturbations to the gradient iteration to enable efficient escape from saddle points and achieve second-order convergence guarantees. We show that, with a sufficiently small fixed step size, the it…

@arXiv_statME_bot@mastoxiv.page
2025-09-29 09:11:27

Federated Learning of Quantile Inference under Local Differential Privacy
Leheng Cai, Qirui Hu, Shuyuan Wu
https://arxiv.org/abs/2509.21800 https://arxiv.o…

Federated Learning of Quantile Inference under Local Differential Privacy
In this paper, we investigate federated learning for quantile inference under local differential privacy (LDP). We propose an estimator based on local stochastic gradient descent (SGD), whose local gradients are perturbed via a randomized mechanism with global parameters, making the procedure tolerant of communication and storage constraints without compromising statistical efficiency. Although the quantile loss and its corresponding gradient do not satisfy standard smoothness conditions typica…

@arXiv_csLG_bot@mastoxiv.page
2025-09-01 09:56:02

Convergence of Stochastic Gradient Methods for Wide Two-Layer Physics-Informed Neural Networks
Bangti Jin, Longjun Wu
https://arxiv.org/abs/2508.21571 https://

Convergence of Stochastic Gradient Methods for Wide Two-Layer Physics-Informed Neural Networks
Physics informed neural networks (PINNs) represent a very popular class of neural solvers for partial differential equations. In practice, one often employs stochastic gradient descent type algorithms to train the neural network. Therefore, the convergence guarantee of stochastic gradient descent is of fundamental importance. In this work, we establish the linear convergence of stochastic gradient descent / flow in training over-parameterized two layer PINNs for a general class of activation fu…

@arXiv_statML_bot@mastoxiv.page
2025-10-03 09:42:31

Adaptive Kernel Selection for Stein Variational Gradient Descent
Moritz Melcher, Simon Weissmann, Ashia C. Wilson, Jakob Zech
https://arxiv.org/abs/2510.02067 https://

Adaptive Kernel Selection for Stein Variational Gradient Descent
A central challenge in Bayesian inference is efficiently approximating posterior distributions. Stein Variational Gradient Descent (SVGD) is a popular variational inference method which transports a set of particles to approximate a target distribution. The SVGD dynamics are governed by a reproducing kernel Hilbert space (RKHS) and are highly sensitive to the choice of the kernel function, which directly influences both convergence and approximation quality. The commonly used median heuristic o…

@arXiv_mathAP_bot@mastoxiv.page
2025-09-04 09:30:11

Exponential ergodicity of mean-field Langevin dynamics by synchronous coupling
Mohamed Alfaki Aboubacrine Assadek (MATHSTIC, LAREMA)
https://arxiv.org/abs/2509.03124 https://

Exponential ergodicity of mean-field Langevin dynamics by synchronous coupling
As an example of the nonlinear Fokker-Planck equation, the mean field Langevin dynamics recently attracts attention due to its connection to (noisy) gradient descent on infinitely wide neural networks in the mean field regime, and hence the convergence property of the dynamics is of great theoretical interest. In this paper, In the continuity of [1, 2], we are interested by the long-time behavior and uniform in time propagation of chaos by synchronous coupling for mean-field Langevin dynamics (…

@arXiv_nlinCG_bot@mastoxiv.page
2025-08-07 07:50:43

The Glider Equation for Asymptotic Lenia
Hiroki Kojima, Ivan Yevenko, Takashi Ikegami
https://arxiv.org/abs/2508.04167 https://arxiv.org/pdf/2508.04167

The Glider Equation for Asymptotic Lenia
Lenia is a continuous extension of Conway's Game of Life that exhibits rich pattern formations including self-propelling structures called gliders. In this paper, we focus on Asymptotic Lenia, a variant formulated as partial differential equations. By utilizing this mathematical formulation, we analytically derive the conditions for glider patterns, which we term the ``Glider Equation.'' We demonstrate that by using this equation as a loss function, gradient descent methods can successfully dis…

@arXiv_mathOC_bot@mastoxiv.page
2025-08-12 09:52:33

Near-Optimal Convergence of Accelerated Gradient Methods under Generalized and $(L_0, L_1)$-Smoothness
Alexander Tyurin
https://arxiv.org/abs/2508.06884 https://

Near-Optimal Convergence of Accelerated Gradient Methods under Generalized and $(L_0, L_1)$-Smoothness
We study first-order methods for convex optimization problems with functions $f$ satisfying the recently proposed $\ell$-smoothness condition $||\nabla^{2}f(x)|| \le \ell\left(||\nabla f(x)||\right),$ which generalizes the $L$-smoothness and $(L_{0},L_{1})$-smoothness. While accelerated gradient descent AGD is known to reach the optimal complexity $O(\sqrt{L} R / \sqrt{\varepsilon})$ under $L$-smoothness, where $\varepsilon$ is an error tolerance and $R$ …

@arXiv_csLO_bot@mastoxiv.page
2025-09-25 12:40:17

Replaced article(s) found for cs.LO. https://arxiv.org/list/cs.LO/new
[1/1]:
- Compact Rule-Based Classifier Learning via Gradient Descent
Javier Fumanal-Idocin, Raquel Fernandez-Peralta, Javier Andreu-Perez

@arXiv_csLG_bot@mastoxiv.page
2025-10-08 10:26:09

Correlating Cross-Iteration Noise for DP-SGD using Model Curvature
Xin Gu, Yingtai Xiao, Guanlin He, Jiamu Bai, Daniel Kifer, Kiwan Maeng
https://arxiv.org/abs/2510.05416 https:…

Correlating Cross-Iteration Noise for DP-SGD using Model Curvature
Differentially private stochastic gradient descent (DP-SGD) offers the promise of training deep learning models while mitigating many privacy risks. However, there is currently a large accuracy gap between DP-SGD and normal SGD training. This has resulted in different lines of research investigating orthogonal ways of improving privacy-preserving training. One such line of work, known as DP-MF, correlates the privacy noise across different iterations of stochastic gradient descent -- allowing l…

@arXiv_mathOC_bot@mastoxiv.page
2025-09-12 07:56:29

Convexity of Optimization Curves: Local Sharp Thresholds, Robustness Impossibility, and New Counterexamples
Le Duc Hieu
https://arxiv.org/abs/2509.08954 https://

Convexity of Optimization Curves: Local Sharp Thresholds, Robustness Impossibility, and New Counterexamples
We study when the \emph{optimization curve} of first--order methods -- the sequence \${f(x\_n)}*{n\ge0}\$ produced by constant--stepsize iterations -- is convex, equivalently when the forward differences \$f(x\_n)-f(x*{n+1})\$ are nonincreasing. For gradient descent (GD) on convex \$L\$--smooth functions, the curve is convex for all stepsizes \$η\le 1.75/L\$, and this threshold is tight. Moreover, gradient norms are nonincreasing for all \$η\le 2/L\$, and in continuous time (gradient flow) th…

@arXiv_eessSY_bot@mastoxiv.page
2025-08-29 09:22:11

Local Observability of a Class of Feedforward Neural Networks
Yi Yang, Victor G. Lopez, Matthias A. M\"uller
https://arxiv.org/abs/2508.20544 https://…

Local Observability of a Class of Feedforward Neural Networks
Beyond the traditional neural network training methods based on gradient descent and its variants, state estimation techniques have been proposed to determine a set of ideal weights from a control-theoretic perspective. Hence, the concept of observability becomes relevant in neural network training. In this paper, we investigate local observability of a class of two-layer feedforward neural networks~(FNNs) with rectified linear unit~(ReLU) activation functions. We analyze local observability of…

@arXiv_csCE_bot@mastoxiv.page
2025-10-07 07:47:18

Towards Fast Option Pricing PDE Solvers Powered by PIELM
Akshay Govind Srinivasan, Anuj Jagannath Said, Sathwik Pentela, Vikas Dwivedi, Balaji Srinivasan
https://arxiv.org/abs/2510.04322

Towards Fast Option Pricing PDE Solvers Powered by PIELM
Partial differential equation (PDE) solvers underpin modern quantitative finance, governing option pricing and risk evaluation. Physics-Informed Neural Networks (PINNs) have emerged as a promising approach for solving the forward and inverse problems of partial differential equations (PDEs) using deep learning. However they remain computationally expensive due to their iterative gradient descent based optimization and scale poorly with increasing model size. This paper introduces Physics-Inform…

@arXiv_statML_bot@mastoxiv.page
2025-09-30 08:48:11

Differentially Private Two-Stage Gradient Descent for Instrumental Variable Regression
Haodong Liang, Yanhao Jin, Krishnakumar Balasubramanian, Lifeng Lai
https://arxiv.org/abs/2509.22794

Differentially Private Two-Stage Gradient Descent for Instrumental Variable Regression
We study instrumental variable regression (IVaR) under differential privacy constraints. Classical IVaR methods (like two-stage least squares regression) rely on solving moment equations that directly use sensitive covariates and instruments, creating significant risks of privacy leakage and posing challenges in designing algorithms that are both statistically efficient and differentially private. We propose a noisy two-state gradient descent algorithm that ensures $ρ$-zero-concentrated differ…

@arXiv_mathOC_bot@mastoxiv.page
2025-09-11 09:51:13

Linear Convergence of Gradient Descent for Quadratically Regularized Optimal Transport
Alberto Gonz\'alez-Sanz, Marcel Nutz, Andr\'es Riveros Valdevenito
https://arxiv.org/abs/2509.08547

Linear Convergence of Gradient Descent for Quadratically Regularized Optimal Transport
In optimal transport, quadratic regularization is an alternative to entropic regularization when sparse couplings or small regularization parameters are desired. Here quadratic regularization means that transport couplings are penalized by the squared $L^2$ norm, or equivalently the $χ^2$ divergence. While a number of computational approaches have been shown to work in practice, quadratic regularization is analytically less tractable than entropic, and we are not aware of a previous theoretica…

@arXiv_csIT_bot@mastoxiv.page
2025-07-24 08:24:20

Information Entropy-Based Scheduling for Communication-Efficient Decentralized Learning
Jaiprakash Nagar, Zheng Chen, Marios Kountouris, Photios A. Stavrou
https://arxiv.org/abs/2507.17426

Information Entropy-Based Scheduling for Communication-Efficient Decentralized Learning
This paper addresses decentralized stochastic gradient descent (D-SGD) over resource-constrained networks by introducing node-based and link-based scheduling strategies to enhance communication efficiency. In each iteration of the D-SGD algorithm, only a few disjoint subsets of nodes or links are randomly activated, subject to a given communication cost constraint. We propose a novel importance metric based on information entropy to determine node and link scheduling probabilities. We validate …

@arXiv_mathOC_bot@mastoxiv.page
2025-08-11 09:03:49

Kahan's Automatic Step-Size Control for Unconstrained Optimization
Yifeng Meng, Chungen Shen, Linuo Xue, Lei-Hong Zhang
https://arxiv.org/abs/2508.06002 https://

Kahan's Automatic Step-Size Control for Unconstrained Optimization
The Barzilai and Borwein (BB) gradient method is one of the most widely-used line-search gradient methods. It computes the step-size for the current iterate by using the information carried in the previous iteration. Recently, William Kahan [Kahan, Automatic Step-Size Control for Minimization Iterations, Technical report, University of California, Berkeley CA, USA, 2019] proposed new Gradient Descent (KGD) step-size strategies which iterate the step-size itself by effectively utilizing the info…

@arXiv_statML_bot@mastoxiv.page
2025-07-29 09:42:31

Statistical Inference for Differentially Private Stochastic Gradient Descent
Xintao Xia, Linjun Zhang, Zhanrui Cai
https://arxiv.org/abs/2507.20560 https://

Statistical Inference for Differentially Private Stochastic Gradient Descent
Privacy preservation in machine learning, particularly through Differentially Private Stochastic Gradient Descent (DP-SGD), is critical for sensitive data analysis. However, existing statistical inference methods for SGD predominantly focus on cyclic subsampling, while DP-SGD requires randomized subsampling. This paper first bridges this gap by establishing the asymptotic properties of SGD under the randomized rule and extending these results to DP-SGD. For the output of DP-SGD, we show that th…

@arXiv_csLG_bot@mastoxiv.page
2025-08-22 10:07:21

Hybrid Least Squares/Gradient Descent Methods for DeepONets
Jun Choi, Chang-Ock Lee, Minam Moon
https://arxiv.org/abs/2508.15394 https://arxiv.org/pdf/2508…

Hybrid Least Squares/Gradient Descent Methods for DeepONets
We propose an efficient hybrid least squares/gradient descent method to accelerate DeepONet training. Since the output of DeepONet can be viewed as linear with respect to the last layer parameters of the branch network, these parameters can be optimized using a least squares (LS) solve, and the remaining hidden layer parameters are updated by means of gradient descent form. However, building the LS system for all possible combinations of branch and trunk inputs yields a prohibitively large line…

@arXiv_mathOC_bot@mastoxiv.page
2025-09-04 10:06:21

On the Perturbed Projection-Based Distributed Gradient-Descent Algorithm: A Fully-Distributed Adaptive Redesign
Tarek Bazizi, Mohamed Maghenem, Paolo Frasca, Antonio Lor\`ia, Elena Panteley
https://arxiv.org/abs/2509.03443

On the Perturbed Projection-Based Distributed Gradient-Descent Algorithm: A Fully-Distributed Adaptive Redesign
In this work, we revisit a classical distributed gradient-descent algorithm, introducing an interesting class of perturbed multi-agent systems. The state of each subsystem represents a local estimate of a solution to the global optimization problem. Thereby, the network is required to minimize local cost functions, while gathering the local estimates around a common value. Such a complex task suggests the interplay of consensus-based dynamics with gradient-descent dynamics. The latter descent d…

@arXiv_mathOC_bot@mastoxiv.page
2025-10-06 09:27:49

Quantitative Convergence Analysis of Projected Stochastic Gradient Descent for Non-Convex Losses via the Goldstein Subdifferential
Yuping Zheng, Andrew Lamperski
https://arxiv.org/abs/2510.02735

Quantitative Convergence Analysis of Projected Stochastic Gradient Descent for Non-Convex Losses via the Goldstein Subdifferential
Stochastic gradient descent (SGD) is the main algorithm behind a large body of work in machine learning. In many cases, constraints are enforced via projections, leading to projected stochastic gradient algorithms. In recent years, a large body of work has examined the convergence properties of projected SGD for non-convex losses in asymptotic and non-asymptotic settings. Strong quantitative guarantees are available for convergence measured via Moreau envelopes. However, these results cannot be…

@arXiv_csLG_bot@mastoxiv.page
2025-08-27 10:34:03

GRADSTOP: Early Stopping of Gradient Descent via Posterior Sampling
Arash Jamshidi, Lauri Sepp\"al\"ainen, Katsiaryna Haitsiukevich, Hoang Phuc Hau Luu, Anton Bj\"orklund, Kai Puolam\"aki
https://arxiv.org/abs/2508.19028

GRADSTOP: Early Stopping of Gradient Descent via Posterior Sampling
Machine learning models are often learned by minimising a loss function on the training data using a gradient descent algorithm. These models often suffer from overfitting, leading to a decline in predictive performance on unseen data. A standard solution is early stopping using a hold-out validation set, which halts the minimisation when the validation loss stops decreasing. However, this hold-out set reduces the data available for training. This paper presents {\sc gradstop}, a novel stochast…

@arXiv_eessSP_bot@mastoxiv.page
2025-08-22 09:05:21

Lightweight Gradient Descent Optimization for Mitigating Hardware Imperfections in RIS Systems
Pedro H. C. de Souza (National Institute of Telecommunications), Luiz A. M. Pereira (National Institute of Telecommunications), Faustino R. G\'omez (National Institute of Telecommunications), Elsa M. Mater\'on (National Institute of Telecommunications), Jorge Ricardo Mej\'ia-Salazar (National Institute of Telecommunications)

Lightweight Gradient Descent Optimization for Mitigating Hardware Imperfections in RIS Systems
Ongoing discussions about the future of wireless communications are reaching a turning point as standardization activities for the sixth generation of mobile networks (6G) become more mature. New technologies must now face renewed scrutiny by the industry and academia in order to be ready for deployment in the near future. Recently, reconfigurable intelligent surfaces (RISs) gained attention as a promising solution for improving the propagation conditions of signal transmission in general. The …

@arXiv_mathOC_bot@mastoxiv.page
2025-09-04 09:04:31

Stochastic versus Deterministic in Stochastic Gradient Descent
Runze Li, Jintao Xu, Wenxun Xing
https://arxiv.org/abs/2509.02912 https://arxiv.org/pdf/2509…

Stochastic versus Deterministic in Stochastic Gradient Descent
This paper considers the mini-batch stochastic gradient descent (SGD) for a structured minimization problem involving a finite-sum function with its gradient being stochastically approximated, and an independent term with its gradient being deterministically computed. We focus on the stochastic versus deterministic behavior of the mini-batch SGD for this setting. A convergence analysis is provided that captures the different roles of these two parts. Linear convergence of the algorithm to a nei…

@arXiv_csAI_bot@mastoxiv.page
2025-09-18 09:08:41

From Next Token Prediction to (STRIPS) World Models -- Preliminary Results
Carlos N\'u\~nez-Molina, Vicen\c{c} G\'omez, Hector Geffner
https://arxiv.org/abs/2509.13389 h…

From Next Token Prediction to (STRIPS) World Models -- Preliminary Results
We consider the problem of learning propositional STRIPS world models from action traces alone, using a deep learning architecture (transformers) and gradient descent. The task is cast as a supervised next token prediction problem where the tokens are the actions, and an action $a$ may follow an action sequence if the hidden effects of the previous actions do not make an action precondition of $a$ false. We show that a suitable transformer architecture can faithfully represent propositional STR…

@arXiv_csLG_bot@mastoxiv.page
2025-09-01 09:42:22

Adaptive Heavy-Tailed Stochastic Gradient Descent
Bodu Gong, Gustavo Enrique Batista, Pierre Lafaye de Micheaux
https://arxiv.org/abs/2508.21353 https://ar…

Adaptive Heavy-Tailed Stochastic Gradient Descent
In the era of large-scale neural network models, optimization algorithms often struggle with generalization due to an overreliance on training loss. One key insight widely accepted in the machine learning community is the idea that wide basins (regions around a local minimum where the loss increases gradually) promote better generalization by offering greater stability to small changes in input data or model parameters. In contrast, sharp minima are typically more sensitive and less stable. Mot…

@arXiv_csCE_bot@mastoxiv.page
2025-10-03 11:05:23

Crosslisted article(s) found for cs.CE. https://arxiv.org/list/cs.CE/new
[1/1]:
- Fast training of accurate physics-informed neural networks without gradient descent
Datar, Kapoor, Chandra, Sun, Bolager, Burak, Veselovska, Fornasier, Dietrich

@arXiv_csLG_bot@mastoxiv.page
2025-08-18 09:39:00

Harmonized Gradient Descent for Class Imbalanced Data Stream Online Learning
Han Zhou, Hongpeng Yin, Xuanhong Deng, Yuyu Huang, Hao Ren
https://arxiv.org/abs/2508.11353 https://…

Harmonized Gradient Descent for Class Imbalanced Data Stream Online Learning
Many real-world data are sequentially collected over time and often exhibit skewed class distributions, resulting in imbalanced data streams. While existing approaches have explored several strategies, such as resampling and reweighting, for imbalanced data stream learning, our work distinguishes itself by addressing the imbalance problem through training modification, particularly focusing on gradient descent techniques. We introduce the harmonized gradient descent (HGD) algorithm, which aims …

@arXiv_statML_bot@mastoxiv.page
2025-10-06 12:39:12

Replaced article(s) found for stat.ML. https://arxiv.org/list/stat.ML/new
[2/2]:
- Gradient Descent with Large Step Sizes: Chaos and Fractal Convergence Region
Shuang Liang, Guido Mont\'ufar

@arXiv_mathOC_bot@mastoxiv.page
2025-09-05 07:58:11

Towards understanding Accelerated Stein Variational Gradient Flow -- Analysis of Generalized Bilinear Kernels for Gaussian target distributions
Viktor Stein, Wuchen Li
https://arxiv.org/abs/2509.04008 …

Towards understanding Accelerated Stein Variational Gradient Flow -- Analysis of Generalized Bilinear Kernels for Gaussian target distributions
Stein variational gradient descent (SVGD) is a kernel-based and non-parametric particle method for sampling from a target distribution, such as in Bayesian inference and other machine learning tasks. Different from other particle methods, SVGD does not require estimating the score, which is the gradient of the log-density. However, in practice, SVGD can be slow compared to score-estimation-based sampling algorithms. To design a fast and efficient high-dimensional sampling algorithm with the adv…

@arXiv_csLG_bot@mastoxiv.page
2025-08-29 10:30:11

Fast Convergence Rates for Subsampled Natural Gradient Algorithms on Quadratic Model Problems
Gil Goldshlager, Jiang Hu, Lin Lin
https://arxiv.org/abs/2508.21022 https://…

Fast Convergence Rates for Subsampled Natural Gradient Algorithms on Quadratic Model Problems
Subsampled natural gradient descent (SNGD) has shown impressive results for parametric optimization tasks in scientific machine learning, such as neural network wavefunctions and physics-informed neural networks, but it has lacked a theoretical explanation. We address this gap by analyzing the convergence of SNGD and its accelerated variant, SPRING, for idealized parametric optimization problems where the model is linear and the loss function is strongly convex and quadratic. In the special cas…

@arXiv_statML_bot@mastoxiv.page
2025-09-29 08:58:37

Effective continuous equations for adaptive SGD: a stochastic analysis view
Luca Callisti, Marco Romito, Francesco Triggiano
https://arxiv.org/abs/2509.21614 https://

Effective continuous equations for adaptive SGD: a stochastic analysis view
We present a theoretical analysis of some popular adaptive Stochastic Gradient Descent (SGD) methods in the small learning rate regime. Using the stochastic modified equations framework introduced by Li et al., we derive effective continuous stochastic dynamics for these methods. Our key contribution is that sampling-induced noise in SGD manifests in the limit as independent Brownian motions driving the parameter and gradient second momentum evolutions. Furthermore, extending the approach of Ma…

@arXiv_csLG_bot@mastoxiv.page
2025-10-03 11:04:01

Equilibrium Matching: Generative Modeling with Implicit Energy-Based Models
Runqian Wang, Yilun Du
https://arxiv.org/abs/2510.02300 https://arxiv.org/pdf/2…

Equilibrium Matching: Generative Modeling with Implicit Energy-Based Models
We introduce Equilibrium Matching (EqM), a generative modeling framework built from an equilibrium dynamics perspective. EqM discards the non-equilibrium, time-conditional dynamics in traditional diffusion and flow-based generative models and instead learns the equilibrium gradient of an implicit energy landscape. Through this approach, we can adopt an optimization-based sampling process at inference time, where samples are obtained by gradient descent on the learned landscape with adjustable s…

@arXiv_mathOC_bot@mastoxiv.page
2025-08-27 09:33:33

Adaptive control mechanisms in gradient descent algorithms
Andrea Iannelli
https://arxiv.org/abs/2508.19100 https://arxiv.org/pdf/2508.19100

Adaptive control mechanisms in gradient descent algorithms
The problem of designing adaptive stepsize sequences for the gradient descent method applied to convex and locally smooth functions is studied. We take an adaptive control perspective and design update rules for the stepsize that make use of both past (measured) and future (predicted) information. We show that Lyapunov analysis can guide in the systematic design of adaptive parameters striking a balance between convergence rates and robustness to computational errors or inexact gradient informa…

@arXiv_statML_bot@mastoxiv.page
2025-09-23 09:37:40

Risk Comparisons in Linear Regression: Implicit Regularization Dominates Explicit Regularization
Jingfeng Wu, Peter L. Bartlett, Jason D. Lee, Sham M. Kakade, Bin Yu
https://arxiv.org/abs/2509.17251

Risk Comparisons in Linear Regression: Implicit Regularization Dominates Explicit Regularization
Existing theory suggests that for linear regression problems categorized by capacity and source conditions, gradient descent (GD) is always minimax optimal, while both ridge regression and online stochastic gradient descent (SGD) are polynomially suboptimal for certain categories of such problems. Moving beyond minimax theory, this work provides instance-wise comparisons of the finite-sample risks for these algorithms on any well-specified linear regression problem. Our analysis yields three …

@arXiv_mathOC_bot@mastoxiv.page
2025-09-11 09:55:13

Decentralized Stochastic Nonconvex Optimization under the Relaxed Smoothness
Luo Luo, Xue Cui, Tingkai Jia, Cheng Chen
https://arxiv.org/abs/2509.08726 https://

Decentralized Stochastic Nonconvex Optimization under the Relaxed Smoothness
This paper studies decentralized optimization problem $f(\mathbf{x})=\frac{1}{m}\sum_{i=1}^m f_i(\mathbf{x})$, where each local function has the form of $f_i(\mathbf{x}) = {\mathbb E}\left[F(\mathbf{x};ξ_i)\right]$ which is $(L_0,L_1)$-smooth but possibly nonconvex and the random variable $ξ_i$ follows distribution ${\mathcal D}_i$. We propose a novel algorithm called decentralized normalized stochastic gradient descent (DNSGD), which can achieve the $ε$-stationary point on each local agent.…

@arXiv_csLG_bot@mastoxiv.page
2025-08-20 10:15:20

Fisher-Orthogonal Projection Methods for Natural Gradient Descent with Large Batches
Yishun Lu, Wesley Armour
https://arxiv.org/abs/2508.13898 https://arxi…

Fisher-Orthogonal Projection Methods for Natural Gradient Descent with Large Batches
Modern GPUs are equipped with large amounts of high-bandwidth memory, enabling them to support mini-batch sizes of up to tens of thousands of training samples. However, most existing optimizers struggle to perform effectively at such a large batch size. As batch size increases, gradient noise decreases due to averaging over many samples, limiting the ability of first-order methods to escape sharp or suboptimal minima and reach the global minimum. Meanwhile, second-order methods like the natural…

@arXiv_mathOC_bot@mastoxiv.page
2025-08-29 09:11:11

Revisit Stochastic Gradient Descent for Strongly Convex Objectives: Tight Uniform-in-Time Bounds
Kang Chen, Yasong Feng, Tianyu Wang
https://arxiv.org/abs/2508.20823 https://

Revisit Stochastic Gradient Descent for Strongly Convex Objectives: Tight Uniform-in-Time Bounds
Stochastic optimization for strongly convex objectives is a fundamental problem in statistics and optimization. This paper revisits the standard Stochastic Gradient Descent (SGD) algorithm for strongly convex objectives and establishes tight uniform-in-time convergence bounds. We prove that with probability larger than $1 - β$, a $\frac{\log \log k + \log (1/β)}{k}$ convergence bound simultaneously holds for all $k \in \mathbb{N}_+$, and show that this rate is tight up to constants. Our resul…

@arXiv_csLG_bot@mastoxiv.page
2025-08-22 10:12:21

Stabilization of Perturbed Loss Function: Differential Privacy without Gradient Noise
Salman Habib, Remi Chou, Taejoon Kim
https://arxiv.org/abs/2508.15523 https://

Stabilization of Perturbed Loss Function: Differential Privacy without Gradient Noise
We propose SPOF (Stabilization of Perturbed Loss Function), a differentially private training mechanism intended for multi-user local differential privacy (LDP). SPOF perturbs a stabilized Taylor expanded polynomial approximation of a model's training loss function, where each user's data is privatized by calibrated noise added to the coefficients of the polynomial. Unlike gradient-based mechanisms such as differentially private stochastic gradient descent (DP-SGD), SPOF does not require inject…

@arXiv_mathOC_bot@mastoxiv.page
2025-09-18 09:06:51

Accelerated Gradient Methods with Biased Gradient Estimates: Risk Sensitivity, High-Probability Guarantees, and Large Deviation Bounds
Mert G\"urb\"uzbalaban, Yasa Syed, Necdet Serhat Aybat
https://arxiv.org/abs/2509.13628

Accelerated Gradient Methods with Biased Gradient Estimates: Risk Sensitivity, High-Probability Guarantees, and Large Deviation Bounds
We study trade-offs between convergence rate and robustness to gradient errors in first-order methods. Our focus is on generalized momentum methods (GMMs), a class that includes Nesterov's accelerated gradient, heavy-ball, and gradient descent. We allow stochastic gradient errors that may be adversarial and biased, and quantify robustness via the risk-sensitive index (RSI) from robust control theory. For quadratic objectives with i.i.d. Gaussian noise, we give closed-form expressions for RSI us…

@arXiv_csLG_bot@mastoxiv.page
2025-09-18 10:15:41

A Universal Banach--Bregman Framework for Stochastic Iterations: Unifying Stochastic Mirror Descent, Learning and LLM Training
Johnny R. Zhang (Independent Researcher), Xiaomei Mi (University of Manchester), Gaoyuan Du (Amazon), Qianyi Sun (Microsoft), Shiqi Wang (Meta), Jiaxuan Li (Amazon), Wenhua Zhou (Independent Researcher)
https://arx…

A Universal Banach--Bregman Framework for Stochastic Iterations: Unifying Stochastic Mirror Descent, Learning and LLM Training
Stochastic optimization powers the scalability of modern artificial intelligence, spanning machine learning, deep learning, reinforcement learning, and large language model training. Yet, existing theory remains largely confined to Hilbert spaces, relying on inner-product frameworks and orthogonality. This paradigm fails to capture non-Euclidean settings, such as mirror descent on simplices, Bregman proximal methods for sparse learning, natural gradient descent in information geometry, or Kullb…

@arXiv_statML_bot@mastoxiv.page
2025-10-02 09:02:31

Guaranteed Noisy CP Tensor Recovery via Riemannian Optimization on the Segre Manifold
Ke Xu, Yuefeng Han
https://arxiv.org/abs/2510.00569 https://arxiv.org…

Guaranteed Noisy CP Tensor Recovery via Riemannian Optimization on the Segre Manifold
Recovering a low-CP-rank tensor from noisy linear measurements is a central challenge in high-dimensional data analysis, with applications spanning tensor PCA, tensor regression, and beyond. We exploit the intrinsic geometry of rank-one tensors by casting the recovery task as an optimization problem over the Segre manifold, the smooth Riemannian manifold of rank-one tensors. This geometric viewpoint yields two powerful algorithms: Riemannian Gradient Descent (RGD) and Riemannian Gauss-Newton (R…

@arXiv_csLG_bot@mastoxiv.page
2025-09-22 10:33:31

DIVEBATCH: Accelerating Model Training Through Gradient-Diversity Aware Batch Size Adaptation
Yuen Chen, Yian Wang, Hari Sundaram
https://arxiv.org/abs/2509.16173 https://

DIVEBATCH: Accelerating Model Training Through Gradient-Diversity Aware Batch Size Adaptation
The goal of this paper is to accelerate the training of machine learning models, a critical challenge since the training of large-scale deep neural models can be computationally expensive. Stochastic gradient descent (SGD) and its variants are widely used to train deep neural networks. In contrast to traditional approaches that focus on tuning the learning rate, we propose a novel adaptive batch size SGD algorithm, DiveBatch, that dynamically adjusts the batch size. Adapting the batch size is c…

@arXiv_mathOC_bot@mastoxiv.page
2025-07-29 11:25:11

Stochastic gradient with least-squares control variates
Fabio Nobile, Matteo Raviola, Nathan Schaeffer
https://arxiv.org/abs/2507.20981 https://arxiv.org/p…

Stochastic gradient with least-squares control variates
The stochastic gradient descent (SGD) method is a widely used approach for solving stochastic optimization problems, but its convergence is typically slow. Existing variance reduction techniques, such as SAGA, improve convergence by leveraging stored gradient information; however, they are restricted to settings where the objective functional is a finite sum, and their performance degrades when the number of terms in the sum is large. In this work, we propose a novel approach which is well suit…

@arXiv_csLG_bot@mastoxiv.page
2025-08-25 09:58:50

Escaping Saddle Points via Curvature-Calibrated Perturbations: A Complete Analysis with Explicit Constants and Empirical Validation
Faruk Alpay, Hamdi Alakkad
https://arxiv.org/abs/2508.16540

Escaping Saddle Points via Curvature-Calibrated Perturbations: A Complete Analysis with Explicit Constants and Empirical Validation
We present a comprehensive theoretical analysis of first-order methods for escaping strict saddle points in smooth non-convex optimization. Our main contribution is a Perturbed Saddle-escape Descent (PSD) algorithm with fully explicit constants and a rigorous separation between gradient-descent and saddle-escape phases. For a function $f:\mathbb{R}^d\to\mathbb{R}$ with $\ell$-Lipschitz gradient and $ρ$-Lipschitz Hessian, we prove that PSD finds an $(ε,\sqrt{ρε})$-approximate second-order st…

@arXiv_mathOC_bot@mastoxiv.page
2025-07-21 08:59:50

Gradient descent avoids strict saddles with a simple line-search method too
Andreea-Alexandra Mu\c{s}at, Nicolas Boumal
https://arxiv.org/abs/2507.13804 ht…

Gradient descent avoids strict saddles with a simple line-search method too
It is known that gradient descent (GD) on a $C^2$ cost function generically avoids strict saddle points when using a small, constant step size. However, no such guarantee existed for GD with a line-search method. We provide one for a modified version of the standard Armijo backtracking method with generic, arbitrarily large initial step size. In contrast to previous works, our analysis does not require a globally Lipschitz gradient. We extend this to the Riemannian setting (RGD), assuming the…

@arXiv_statML_bot@mastoxiv.page
2025-08-01 08:25:01

A Smoothing Newton Method for Rank-one Matrix Recovery
Tyler Maunu, Gabriel Abreu
https://arxiv.org/abs/2507.23017 https://arxiv.org/pdf/2507.23017

A Smoothing Newton Method for Rank-one Matrix Recovery
We consider the phase retrieval problem, which involves recovering a rank-one positive semidefinite matrix from rank-one measurements. A recently proposed algorithm based on Bures-Wasserstein gradient descent (BWGD) exhibits superlinear convergence, but it is unstable, and existing theory can only prove local linear convergence for higher rank matrix recovery. We resolve this gap by revealing that BWGD implements Newton's method with a nonsmooth and nonconvex objective. We develop a smoothing f…

@arXiv_mathOC_bot@mastoxiv.page
2025-07-22 11:29:20

Power-Constrained Policy Gradient Methods for LQR
Ashwin Verma, Aritra Mitra, Lintao Ye, Vijay Gupta
https://arxiv.org/abs/2507.15806 https://

Power-Constrained Policy Gradient Methods for LQR
Consider a discrete-time Linear Quadratic Regulator (LQR) problem solved using policy gradient descent when the system matrices are unknown. The gradient is transmitted across a noisy channel over a finite time horizon using analog communication by a transmitter with an average power constraint. This is a simple setup at the intersection of reinforcement learning and networked control systems. We first consider a communication-constrained optimization framework, where gradient descent is applie…

@arXiv_csLG_bot@mastoxiv.page
2025-08-21 10:17:00

Universal and Transferable Adversarial Attack on Large Language Models Using Exponentiated Gradient Descent
Sajib Biswas, Mao Nishino, Samuel Jacob Chacko, Xiuwen Liu
https://arxiv.org/abs/2508.14853

Universal and Transferable Adversarial Attack on Large Language Models Using Exponentiated Gradient Descent
As large language models (LLMs) are increasingly deployed in critical applications, ensuring their robustness and safety alignment remains a major challenge. Despite the overall success of alignment techniques such as reinforcement learning from human feedback (RLHF) on typical prompts, LLMs remain vulnerable to jailbreak attacks enabled by crafted adversarial triggers appended to user prompts. Most existing jailbreak methods either rely on inefficient searches over discrete token spaces or dir…

@arXiv_statML_bot@mastoxiv.page
2025-09-30 11:12:51

Quantitative convergence of trained single layer neural networks to Gaussian processes
Eloy Mosig, Andrea Agazzi, Dario Trevisan
https://arxiv.org/abs/2509.24544 https://…

Quantitative convergence of trained single layer neural networks to Gaussian processes
In this paper, we study the quantitative convergence of shallow neural networks trained via gradient descent to their associated Gaussian processes in the infinite-width limit. While previous work has established qualitative convergence under broad settings, precise, finite-width estimates remain limited, particularly during training. We provide explicit upper bounds on the quadratic Wasserstein distance between the network output and its Gaussian approximation at any training time $t \ge 0…

@arXiv_mathOC_bot@mastoxiv.page
2025-08-08 08:18:02

Can SGD Handle Heavy-Tailed Noise?
Ilyas Fatkhullin, Florian H\"ubler, Guanghui Lan
https://arxiv.org/abs/2508.04860 https://arxiv.org/pdf/2508.04860

Can SGD Handle Heavy-Tailed Noise?
Stochastic Gradient Descent (SGD) is a cornerstone of large-scale optimization, yet its theoretical behavior under heavy-tailed noise -- common in modern machine learning and reinforcement learning -- remains poorly understood. In this work, we rigorously investigate whether vanilla SGD, devoid of any adaptive modifications, can provably succeed under such adverse stochastic conditions. Assuming only that stochastic gradients have bounded $p$-th moments for some $p \in (1, 2]$, we establish sha…

@arXiv_statML_bot@mastoxiv.page
2025-07-30 09:06:22

From Sublinear to Linear: Fast Convergence in Deep Networks via Locally Polyak-Lojasiewicz Regions
Agnideep Aich, Ashit Baran Aich, Bruce Wade
https://arxiv.org/abs/2507.21429 h…

From Sublinear to Linear: Fast Convergence in Deep Networks via Locally Polyak-Lojasiewicz Regions
The convergence of gradient descent (GD) on the non-convex loss landscapes of deep neural networks (DNNs) presents a fundamental theoretical challenge. While recent work has established that GD converges to a stationary point at a sublinear rate within locally quasi-convex regions (LQCRs), this fails to explain the exponential convergence rates consistently observed in practice. In this paper, we resolve this discrepancy by proving that under a mild assumption on Neural Tangent Kernel (NTK) sta…

@arXiv_mathOC_bot@mastoxiv.page
2025-10-07 10:41:52

A Frank-Wolfe Algorithm for Strongly Monotone Variational Inequalities
Reza Rahimi Baghbadorani, Peyman Mohajerin Esfahani, Sergio Grammatico
https://arxiv.org/abs/2510.03842 ht…

A Frank-Wolfe Algorithm for Strongly Monotone Variational Inequalities
We propose an accelerated algorithm with a Frank-Wolfe method as an oracle for solving strongly monotone variational inequality problems. While standard solution approaches, such as projected gradient descent (aka value iteration), involve projecting onto the desired set at each iteration, a distinctive feature of our proposed method is the use of a linear minimization oracle in each iteration. This difference potentially reduces the projection cost, a factor that can become significant for cer…

@arXiv_csLG_bot@mastoxiv.page
2025-07-31 13:34:44

Replaced article(s) found for cs.LG. https://arxiv.org/list/cs.LG/new
[2/4]:
- Convergence Properties of Natural Gradient Descent for Minimizing KL Divergence
Adwait Datar, Nihat Ay

@arXiv_mathOC_bot@mastoxiv.page
2025-10-07 10:40:42

Learning Polynomial Activation Functions for Deep Neural Networks
Linghao Zhang, Jiawang Nie, Tingting Tang
https://arxiv.org/abs/2510.03682 https://arxiv.…

Learning Polynomial Activation Functions for Deep Neural Networks
Activation functions are crucial for deep neural networks. This novel work frames the problem of training neural network with learnable polynomial activation functions as a polynomial optimization problem, which is solvable by the Moment-SOS hierarchy. This work represents a fundamental departure from the conventional paradigm of training deep neural networks, which relies on local optimization methods like backpropagation and gradient descent. Numerical experiments are presented to demonstrate…

@arXiv_mathOC_bot@mastoxiv.page
2025-08-27 08:28:32

Norm-Constrained Flows and Sign-Based Optimization: Theory and Algorithms
Valentin Leplat, Sergio Mayorga, Roland Hildebrand, Alexander Gasnikov
https://arxiv.org/abs/2508.18510

Norm-Constrained Flows and Sign-Based Optimization: Theory and Algorithms
Sign Gradient Descent (SignGD) is a simple yet robust optimization method, widely used in machine learning for its resilience to gradient noise and compatibility with low-precision computations. While its empirical performance is well established, its theoretical understanding remains limited. In this work, we revisit SignGD from a continuous-time perspective, showing that it arises as an Euler discretization of a norm-constrained gradient flow. This viewpoint reveals a trust-region interpretat…

@arXiv_csLG_bot@mastoxiv.page
2025-08-27 10:25:53

Stability and Generalization for Bellman Residuals
Enoch H. Kang, Kyoungseok Jang
https://arxiv.org/abs/2508.18741 https://arxiv.org/pdf/2508.18741

Stability and Generalization for Bellman Residuals
Offline reinforcement learning and offline inverse reinforcement learning aim to recover near-optimal value functions or reward models from a fixed batch of logged trajectories, yet current practice still struggles to enforce Bellman consistency. Bellman residual minimization (BRM) has emerged as an attractive remedy, as a globally convergent stochastic gradient descent-ascent based method for BRM has been recently discovered. However, its statistical behavior in the offline setting remains lar…

@arXiv_mathOC_bot@mastoxiv.page
2025-09-05 12:30:09

Replaced article(s) found for math.OC. https://arxiv.org/list/math.OC/new
[1/1]:
- FastPart: Over-Parameterized Stochastic Gradient Descent for Sparse optimisation on Measures
Yohann De Castro, S\'ebastien Gadat, Cl\'ement Marteau

@arXiv_mathOC_bot@mastoxiv.page
2025-07-29 11:16:41

Numerical Design of Optimized First-Order Algorithms
Yassine Kamri, Julien M. Hendrickx, Fran\c{c}ois Glineur
https://arxiv.org/abs/2507.20773 https://arxi…

Numerical Design of Optimized First-Order Algorithms
We derive several numerical methods for designing optimized first-order algorithms in unconstrained convex optimization settings. Our methods are based on the Performance Estimation Problem (PEP) framework, which casts the worst-case analysis of optimization algorithms as an optimization problem itself. We benchmark our methods against existing approaches in the literature on the task of optimizing the step sizes of memoryless gradient descent (which uses only the current gradient for updates) …

@arXiv_csLG_bot@mastoxiv.page
2025-08-21 10:10:10

Cooperative SGD with Dynamic Mixing Matrices
Soumya Sarkar, Shweta Jain
https://arxiv.org/abs/2508.14565 https://arxiv.org/pdf/2508.14565

Cooperative SGD with Dynamic Mixing Matrices
One of the most common methods to train machine learning algorithms today is the stochastic gradient descent (SGD). In a distributed setting, SGD-based algorithms have been shown to converge theoretically under specific circumstances. A substantial number of works in the distributed SGD setting assume a fixed topology for the edge devices. These papers also assume that the contribution of nodes to the global model is uniform. However, experiments have shown that such assumptions are suboptimal …

@arXiv_mathOC_bot@mastoxiv.page
2025-08-22 08:48:00

Controlled Optimization of Quadratic Functions in $\mathbb{R}^n$
Jean-Jacques Godeme
https://arxiv.org/abs/2508.15515 https://arxiv.org/pdf/2508.15515

Controlled Optimization of Quadratic Functions in $\mathbb{R}^n$
In this work, we introduce and study the controllability of the trajectories of a linear dynamical system, which can be used to solve the minimization of a quadratic function in finite dimension. We named this dynamical system the controlled quadratic gradient flow. Finally, we introduce what we call the controlled quadratic gradient descent and the controlled proximity operator which are respectively the Euler explicit and implicit discretization of the controlled gradient flow.

@arXiv_csLG_bot@mastoxiv.page
2025-08-20 10:02:40

Explainable Learning Rate Regimes for Stochastic Optimization
Zhuang Yang
https://arxiv.org/abs/2508.13639 https://arxiv.org/pdf/2508.13639

Explainable Learning Rate Regimes for Stochastic Optimization
Modern machine learning is trained by stochastic gradient descent (SGD), whose performance critically depends on how the learning rate (LR) is adjusted and decreased over time. Yet existing LR regimes may be intricate, or need to tune one or more additional hyper-parameters manually whose bottlenecks include huge computational expenditure, time and power in practice. This work, in a natural and direct manner, clarifies how LR should be updated automatically only according to the intrinsic varia…

@arXiv_mathOC_bot@mastoxiv.page
2025-09-29 08:23:57

Regularized Overestimated Newton
Danny Duan, Hanbaek Lyu
https://arxiv.org/abs/2509.21684 https://arxiv.org/pdf/2509.21684…

Regularized Overestimated Newton
We propose Regularized Overestimated Newton (RON), a Newton-type method with low per-iteration cost and strong global and local convergence guarantees for smooth convex optimization. RON interpolates between gradient descent and globally regularized Newton, with behavior determined by the largest Hessian overestimation error. Globally, when the optimality gap of the objective is large, RON achieves an accelerated $O(n^{-2})$ convergence rate; when small, its rate becomes $O(n^{-1})$. Locally, R…

@arXiv_mathOC_bot@mastoxiv.page
2025-09-29 15:15:58

Replaced article(s) found for math.OC. https://arxiv.org/list/math.OC/new
[1/1]:
- The Limit Points of (Optimistic) Gradient Descent in Min-Max Optimization
Constantinos Daskalakis, Ioannis Panageas

@arXiv_mathOC_bot@mastoxiv.page
2025-09-29 08:22:57

A regret minimization approach to fixed-point iterations
Joon Kwon
https://arxiv.org/abs/2509.21653 https://arxiv.org/pdf/2509.21653

A regret minimization approach to fixed-point iterations
We propose a conversion scheme that turns regret minimizing algorithms into fixed point iterations, with convergence guarantees following from regret bounds. The resulting iterations can be seen as a grand extension of the classical Krasnoselskii--Mann iterations, as the latter are recovered by converting the Online Gradient Descent algorithm. This approach yields new simple iterations for finding fixed points of non-self operators. We also focus on converting algorithms from the AdaGrad family…

@arXiv_mathOC_bot@mastoxiv.page
2025-08-29 09:23:31

Active-set Newton-MR methods for nonconvex optimization problems with bound constraints
Ernesto G. Birgin, Geovani N. Grapiglia, Diaulas S. Marcondes
https://arxiv.org/abs/2508.20967

Active-set Newton-MR methods for nonconvex optimization problems with bound constraints
This paper presents active-set methods for minimizing nonconvex twice-continuously differentiable functions subject to bound constraints. Within the faces of the feasible set, we employ descent methods with Armijo line search, utilizing approximated Newton directions obtained through the Minimum Residual (MINRES) method. To escape the faces, we investigate the use of the Spectral Projected Gradient (SPG) method and a tailored variant of the Cubic Regularization of Newton's method for bound-cons…

@arXiv_mathOC_bot@mastoxiv.page
2025-08-26 10:55:36

Polyak Stepsize: Estimating Optimal Functional Values Without Parameters or Prior Knowledge
Farshed Abdukhakimov, Cuong Anh Pham, Samuel Horv\'ath, Martin Tak\'a\v{c}, Slavom{\i}r Hanzely
https://arxiv.org/abs/2508.17288

Polyak Stepsize: Estimating Optimal Functional Values Without Parameters or Prior Knowledge
The Polyak stepsize for Gradient Descent is known for its fast convergence but requires prior knowledge of the optimal functional value, which is often unavailable in practice. In this paper, we propose a parameter-free approach that estimates this unknown value during the algorithm's execution, enabling a parameter-free stepsize schedule. Our method maintains two sequences of iterates: one with a higher functional value is updated using the Polyak stepsize, and the other one with a lower funct…

Tootfinder

Opt-in global Mastodon full text search. Join the index!