Tootfinder

@arXiv_mathOC_bot@mastoxiv.page
2025-10-14 11:35:58

Nonlinearly Preconditioned Gradient Methods: Momentum and Stochastic Analysis
Konstantinos Oikonomidis, Jan Quan, Panagiotis Patrinos
https://arxiv.org/abs/2510.11312 https://…

Nonlinearly Preconditioned Gradient Methods: Momentum and Stochastic Analysis
We study nonlinearly preconditioned gradient methods for smooth nonconvex optimization problems, focusing on sigmoid preconditioners that inherently perform a form of gradient clipping akin to the widely used gradient clipping technique. Building upon this idea, we introduce a novel heavy ball-type algorithm and provide convergence guarantees under a generalized smoothness condition that is less restrictive than traditional Lipschitz smoothness, thus covering a broader class of functions. Addit…

@arXiv_statML_bot@mastoxiv.page
2025-10-15 09:49:02

Statistical Guarantees for High-Dimensional Stochastic Gradient Descent
Jiaqi Li, Zhipeng Lou, Johannes Schmidt-Hieber, Wei Biao Wu
https://arxiv.org/abs/2510.12013 https://

Statistical Guarantees for High-Dimensional Stochastic Gradient Descent
Stochastic Gradient Descent (SGD) and its Ruppert-Polyak averaged variant (ASGD) lie at the heart of modern large-scale learning, yet their theoretical properties in high-dimensional settings are rarely understood. In this paper, we provide rigorous statistical guarantees for constant learning-rate SGD and ASGD in high-dimensional regimes. Our key innovation is to transfer powerful tools from high-dimensional time series to online learning. Specifically, by viewing SGD as a nonlinear autoregres…

@arXiv_mathNA_bot@mastoxiv.page
2025-10-14 10:54:08

Forward and backward error bounds for a mixed precision preconditioned conjugate gradient algorithm
Thomas Bake, Erin Carson, Yuxin Ma
https://arxiv.org/abs/2510.11379 https://

Forward and backward error bounds for a mixed precision preconditioned conjugate gradient algorithm
The preconditioned conjugate gradient (PCG) algorithm is one of the most popular algorithms for solving large-scale linear systems Ax = b, where A is a symmetric positive definite matrix. Rather than computing residuals directly, it updates the residual vectors recursively. Current analyses of the conjugate gradient (CG) algorithm in finite precision typically assume that the norm of the recursively updated residual goes orders of magnitude below the machine precision, focusing mainly on boundi…

@arXiv_mathAP_bot@mastoxiv.page
2025-10-14 10:33:18

Optimal gradient estimates for conductivity problems with imperfect low-conductivity interfaces
Hongjie Dong, Haigang Li, Yan Zhao
https://arxiv.org/abs/2510.10615 https://

Optimal gradient estimates for conductivity problems with imperfect low-conductivity interfaces
This paper studies field concentration between two nearly touching conductors separated by imperfect low-conductivity interfaces, modeled by Robin boundary conditions. It is known that for any sufficiently small interfacial bonding parameter $γ> 0$, the gradient remains uniformly bounded with respect to the separation distance $\varepsilon$. In contrast, for the perfect bonding case ($γ= 0$, corresponding to the perfect conductivity problem), the gradient may blow up as $\varepsilon \to 0$ at…

@arXiv_csCV_bot@mastoxiv.page
2025-09-15 09:59:21

Grad-CL: Source Free Domain Adaptation with Gradient Guided Feature Disalignment
Rini Smita Thakur, Rajeev Ranjan Dwivedi, Vinod K Kurmi
https://arxiv.org/abs/2509.10134 https:/…

Grad-CL: Source Free Domain Adaptation with Gradient Guided Feature Disalignment
Accurate segmentation of the optic disc and cup is critical for the early diagnosis and management of ocular diseases such as glaucoma. However, segmentation models trained on one dataset often suffer significant performance degradation when applied to target data acquired under different imaging protocols or conditions. To address this challenge, we propose \textbf{Grad-CL}, a novel source-free domain adaptation framework that leverages a pre-trained source model and unlabeled target data to r…

@arXiv_hepph_bot@mastoxiv.page
2025-10-15 09:58:12

Gradient-flowed operator product expansion without IR renormalons
Martin Beneke (TU Munich), Hiromasa Takaura (Kyoto University)
https://arxiv.org/abs/2510.12193 https://…

Gradient-flowed operator product expansion without IR renormalons
A long-standing problem concerns the question how to consistently combine perturbative expansions in QCD with power corrections in the context of the operator product expansion (OPE), since the former exhibit ambiguities due to infrared renormalons, which are of the same order as the power corrections. We propose to use the gradient flow time $1/\sqrt{t}$ as a factorization scale and to express the OPE in terms of IR renormalon-free subtracted perturbative expansions and unambiguous matrix elem…

@arXiv_condmatstrel_bot@mastoxiv.page
2025-09-15 09:08:51

Gradient-based search of quantum phases: discovering unconventional fractional Chern insulators
Andr\'e Grossi Fonseca, Eric Wang, Sachin Vaidya, Patrick J. Ledwith, Ashvin Vishwanath, Marin Solja\v{c}i\'c
https://arxiv.org/abs/2509.10438

Gradient-based search of quantum phases: discovering unconventional fractional Chern insulators
The discovery and understanding of new quantum phases has time and again transformed both fundamental physics and technology, yet progress often relies on slow, intuition-based theoretical considerations or experimental serendipity. Here, we introduce a general gradient-based framework for targeted phase discovery. We define a differentiable function, dubbed "target-phase loss function", which encodes spectral fingerprints of a quantum state, thereby recasting phase search as a tractable optimi…

@arXiv_csCL_bot@mastoxiv.page
2025-10-13 10:39:20

SPG: Sandwiched Policy Gradient for Masked Diffusion Language Models
Chengyu Wang, Paria Rashidinejad, DiJia Su, Song Jiang, Sid Wang, Siyan Zhao, Cai Zhou, Shannon Zejiang Shen, Feiyu Chen, Tommi Jaakkola, Yuandong Tian, Bo Liu
https://arxiv.org/abs/2510.09541

SPG: Sandwiched Policy Gradient for Masked Diffusion Language Models
Diffusion large language models (dLLMs) are emerging as an efficient alternative to autoregressive models due to their ability to decode multiple tokens in parallel. However, aligning dLLMs with human preferences or task-specific rewards via reinforcement learning (RL) is challenging because their intractable log-likelihood precludes the direct application of standard policy gradient methods. While prior work uses surrogates like the evidence lower bound (ELBO), these one-sided approximations c…

@arXiv_mathOC_bot@mastoxiv.page
2025-10-14 11:44:38

Adaptive Conditional Gradient Descent
Abbas Khademi, Antonio Silveti-Falls
https://arxiv.org/abs/2510.11440 https://arxiv.org/pdf/2510.11440

Adaptive Conditional Gradient Descent
Selecting an effective step-size is a fundamental challenge in first-order optimization, especially for problems with non-Euclidean geometries. This paper presents a novel adaptive step-size strategy for optimization algorithms that rely on linear minimization oracles, as used in the Conditional Gradient or non-Euclidean Normalized Steepest Descent algorithms. Using a simple heuristic to estimate a local Lipschitz constant for the gradient, we can determine step-sizes that guarantee sufficient …

@arXiv_mathSG_bot@mastoxiv.page
2025-10-14 08:14:58

An Invitation to Obstruction Bundle Gluing Through Morse Flow Lines
Ipsita Datta, Yuan Yao
https://arxiv.org/abs/2510.10393 https://arxiv.org/pdf/2510.1039…

An Invitation to Obstruction Bundle Gluing Through Morse Flow Lines
We adapt "Obstruction Bundle Gluing (OBG)" techniques from Hutchings and Taubes (arxiv: 0701300, 0705.2074) to Morse theory. We consider Morse function-metric pairs with gradient flowlines that have nontrivial yet well-controlled cokernels (i.e., the gradient flowlines are not transversely cut out). We investigate (i) whether these nontransverse gradient flowlines can be glued to other gradient flowlines and (ii) the bifurcation of gradient flowlines after we perturb the metric to be Morse-Smal…

@arXiv_csLG_bot@mastoxiv.page
2025-09-11 10:10:33

Modified Loss of Momentum Gradient Descent: Fine-Grained Analysis
Matias D. Cattaneo, Boris Shigida
https://arxiv.org/abs/2509.08483 https://arxiv.org/pdf/…

Modified Loss of Momentum Gradient Descent: Fine-Grained Analysis
We analyze gradient descent with Polyak heavy-ball momentum (HB) whose fixed momentum parameter $β\in (0, 1)$ provides exponential decay of memory. Building on Kovachki and Stuart (2021), we prove that on an exponentially attractive invariant manifold the algorithm is exactly plain gradient descent with a modified loss, provided that the step size $h$ is small enough. Although the modified loss does not admit a closed-form expression, we describe it with arbitrary precision and prove global (f…

@arXiv_mathOC_bot@mastoxiv.page
2025-10-15 10:19:31

Temporal Variabilities Limit Convergence Rates in Gradient-Based Online Optimization
Bryan Van Scoy, Gianluca Bianchin
https://arxiv.org/abs/2510.12512 https://

Temporal Variabilities Limit Convergence Rates in Gradient-Based Online Optimization
This paper investigates the fundamental performance limits of gradient-based algorithms for time-varying optimization. Leveraging the internal model principle and root locus techniques, we show that temporal variabilities impose intrinsic limits on the achievable rate of convergence. For a problem with condition ratio $κ$ and time variation whose model has degree $n$, we show that the worst-case convergence rate of any minimal-order gradient-based algorithm is $ρ_\text{TV} = (\frac{κ-1}{κ+1…

@arXiv_mathAP_bot@mastoxiv.page
2025-10-15 09:54:41

Liouville results for $(p,q)$-Laplacian elliptic equations with source terms involving gradient nonlinearities
Mousomi Bhakta, Anup Biswas, Roberta Filippucci
https://arxiv.org/abs/2510.12486

Liouville results for $(p,q)$-Laplacian elliptic equations with source terms involving gradient nonlinearities
In this paper, we present a series of Liouville-type theorems for a class of nonhomogeneous quasilinear elliptic equations featuring reactions that depend on the solution and its gradient. Specifically, we investigate equations of the form $-Δ_p u - Δ_q u = f(u,\nabla u)$ with $p > q > 1$, where the nonlinearity $f$ takes forms such as $u^s|\nabla u|^m$ or $u^s + M|\nabla u|^m$ ($s, m\geq 0$). Our approach is twofold. For cases where the reaction term satisfies $|f(u,…

@arXiv_quantph_bot@mastoxiv.page
2025-10-13 09:24:10

Statistical Benchmarking of Optimization Methods for Variational Quantum Eigensolver under Quantum Noise
Silvie Ill\'esov\'a, Tom\'a\v{s} Bezd\v{e}k, Vojt\v{e}ch Nov\'ak, Bruno Senjean, Martin Beseda
https://arxiv.org/abs/2510.08727

Statistical Benchmarking of Optimization Methods for Variational Quantum Eigensolver under Quantum Noise
This work investigates the performance of numerical optimization algorithms applied to the State-Averaged Orbital-Optimized Variational Quantum Eigensolver for the H2 molecule under various quantum noise conditions. The goal is to assess the stability, accuracy, and computational efficiency of commonly used gradient-based, gradient-free, and global optimization strategies within the Noisy Intermediate-Scale Quantum regime. We systematically compare six representative optimizers, BFGS, SLSQP, Ne…

@arXiv_statME_bot@mastoxiv.page
2025-10-13 09:25:10

Reliability Sensitivity with Response Gradient
Siu-Kui Au, Zi-Jun Cao
https://arxiv.org/abs/2510.09315 https://arxiv.org/pdf/2510.09315

Reliability Sensitivity with Response Gradient
Engineering risk is concerned with the likelihood of failure and the scenarios when it occurs. The sensitivity of failure probability to change in system parameters is relevant to risk-informed decision making. Computing sensitivity is at least one level more difficult than the probability itself, which is already challenged by a large number of input random variables, rare events and implicit nonlinear `black-box' response. Finite difference with Monte Carlo probability estimates is spurious, …

@arXiv_csIR_bot@mastoxiv.page
2025-10-15 09:34:12

Simple Projection Variants Improve ColBERT Performance
Benjamin Clavi\'e, Sean Lee, Rikiya Takehi, Aamir Shakir, Makoto P. Kato
https://arxiv.org/abs/2510.12327 https://

Simple Projection Variants Improve ColBERT Performance
Multi-vector dense retrieval methods like ColBERT systematically use a single-layer linear projection to reduce the dimensionality of individual vectors. In this study, we explore the implications of the MaxSim operator on the gradient flows of the training of multi-vector models and show that such a simple linear projection has inherent, if non-critical, limitations in this setting. We then discuss the theoretical improvements that could result from replacing this single-layer projection with …

@arXiv_eessSP_bot@mastoxiv.page
2025-09-15 08:24:01

Locally Permuted Low Rank Column-wise Sensing
Ahmed Ali Abbasi, Namrata Vaswani
https://arxiv.org/abs/2509.09820 https://arxiv.org/pdf/2509.09820

Locally Permuted Low Rank Column-wise Sensing
We precisely formulate, and provide a solution for, the Low Rank Columnwise Sensing (LRCS) problem when some of the observed data is scrambled/permuted/unlabeled. This problem, which we refer to as permuted LRCS, lies at the intersection of two distinct topics of recent research: unlabeled sensing and low rank column-wise (matrix) sensing. We introduce a novel generalization of the recently developed Alternating Gradient Descent and Minimization (AltGDMin) algorithm to solve this problem. We al…

@arXiv_mathOC_bot@mastoxiv.page
2025-10-15 09:38:41

A Gradient Guided Diffusion Framework for Chance Constrained Programming
Boyang Zhang, Zhiguo Wang, Ya-Feng Liu
https://arxiv.org/abs/2510.12238 https://ar…

A Gradient Guided Diffusion Framework for Chance Constrained Programming
Chance constrained programming (CCP) is a powerful framework for addressing optimization problems under uncertainty. In this paper, we introduce a novel Gradient-Guided Diffusion-based Optimization framework, termed GGDOpt, which tackles CCP through three key innovations. First, GGDOpt accommodates a broad class of CCP problems without requiring the knowledge of the exact distribution of uncertainty-relying solely on a set of samples. Second, to address the nonconvexity of the chance constraint…

@arXiv_csNE_bot@mastoxiv.page
2025-09-15 08:15:41

Predictive Spike Timing Enables Distributed Shortest Path Computation in Spiking Neural Networks
Simen Storesund, Kristian Valset Aars, Robin Dietrich, Nicolai Waniek
https://arxiv.org/abs/2509.10077

Predictive Spike Timing Enables Distributed Shortest Path Computation in Spiking Neural Networks
Efficient planning and sequence selection are central to intelligence, yet current approaches remain largely incompatible with biological computation. Classical graph algorithms like Dijkstra's or A* require global state and biologically implausible operations such as backtracing, while reinforcement learning methods rely on slow gradient-based policy updates that appear inconsistent with rapid behavioral adaptation observed in natural systems. We propose a biologically plausible algorithm fo…

@arXiv_physicsfludyn_bot@mastoxiv.page
2025-10-14 10:08:28

A framework for realisable data-driven active flow control using model predictive control applied to a simplified truck wake
Alberto Solera-Rico, Carlos Sanmiguel Vila, Stefano Discetti
https://arxiv.org/abs/2510.11600

A framework for realisable data-driven active flow control using model predictive control applied to a simplified truck wake
We present an efficient and realisable active flow control framework with few non-intrusive sensors. The method builds upon data-driven, reduced-order predictive models based on Long-Short-Term Memory (LSTM) networks and efficient gradient-based Model Predictive Control (MPC). The model uses only surface-mounted pressure probes to infer the wake state, and is trained entirely offline on a dataset built with open-loop actuations, thus avoiding the complexities of online learning. Sparsification …

@arXiv_mathSG_bot@mastoxiv.page
2025-10-14 09:28:58

From Morse Functions to Lefschetz Fibrations on Cotangent Bundles
Emmanuel Giroux
https://arxiv.org/abs/2510.10669 https://arxiv.org/pdf/2510.10669

From Morse Functions to Lefschetz Fibrations on Cotangent Bundles
We prove that, for any Morse function on a compact manifold and any adapted gradient satisfying the Morse-Smale condition, there is a homotopically unique complex-valued symplectic Lefschetz fibration on the cotangent bundle whose restriction to the zero-section is the given function, whose imaginary part is the evaluation of covectors on the gradient, and which is equivariant under the actions of the fiberwise antipodal involution and the complex conjugation. Then we study the topology and sym…

@arXiv_astrophIM_bot@mastoxiv.page
2025-09-15 08:37:41

A Differentiable Surrogate Model for the Generation of Radio Pulses from In-Ice Neutrino Interactions
Philipp Pilar, Martin Ravn, Christian Glaser, Niklas Wahlstr\"om
https://arxiv.org/abs/2509.10274

A Differentiable Surrogate Model for the Generation of Radio Pulses from In-Ice Neutrino Interactions
The planned IceCube-Gen2 radio neutrino detector at the South Pole will enhance the detection of cosmic ultra-high-energy neutrinos. It is crucial to utilize the available time until construction to optimize the detector design. A fully differentiable pipeline, from signal generation to detector response, would allow for the application of gradient descent techniques to explore the parameter space of the detector. In our work, we focus on the aspect of signal generation, and propose a modulariz…

@arXiv_csLG_bot@mastoxiv.page
2025-09-15 09:49:41

The Hidden Width of Deep ResNets: Tight Error Bounds and Phase Diagrams
L\'ena\"ic Chizat
https://arxiv.org/abs/2509.10167 https://arxiv.org/pdf/2…

The Hidden Width of Deep ResNets: Tight Error Bounds and Phase Diagrams
We study the gradient-based training of large-depth residual networks (ResNets) from standard random initializations. We show that with a diverging depth $L$, a fixed embedding dimension $D$, and an arbitrary hidden width $M$, the training dynamics converges to a Neural Mean ODE training dynamics. Remarkably, the limit is independent of the scaling of $M$, covering practical cases of, say, Transformers, where $M$ (the number of hidden units or attention heads per layer) is typically of the orde…

@arXiv_csGR_bot@mastoxiv.page
2025-09-15 11:00:01

Replaced article(s) found for cs.GR. https://arxiv.org/list/cs.GR/new
[1/1]:
- GASP: A Gradient-Aware Shortest Path Algorithm for Boundary-Confined Visualization of 2-Manifold ...
Sefat E. Rahman, Tushar M. Athawale, Paul Rosen

@arXiv_csRO_bot@mastoxiv.page
2025-10-07 11:43:02

Building Gradient by Gradient: Decentralised Energy Functions for Bimanual Robot Assembly
Alexander L. Mitchell, Joe Watson, Ingmar Posner
https://arxiv.org/abs/2510.04696 https…

Building Gradient by Gradient: Decentralised Energy Functions for Bimanual Robot Assembly
There are many challenges in bimanual assembly, including high-level sequencing, multi-robot coordination, and low-level, contact-rich operations such as component mating. Task and motion planning (TAMP) methods, while effective in this domain, may be prohibitively slow to converge when adapting to disturbances that require new task sequencing and optimisation. These events are common during tight-tolerance assembly, where difficult-to-model dynamics such as friction or deformation require rapi…

@arXiv_csCV_bot@mastoxiv.page
2025-10-14 16:14:34

Crosslisted article(s) found for cs.CV. https://arxiv.org/list/cs.CV/new
[1/3]:
- Gradient-Sign Masking for Task Vector Transport Across Pre-Trained Models
Rinaldi, Panariello, Salici, Liu, Ciccone, Porrello, Calderara

@arXiv_mathDG_bot@mastoxiv.page
2025-10-08 08:33:39

On curvature estimates for four-dimensional gradient Ricci solitons
Huai-Dong Cao
https://arxiv.org/abs/2510.06059 https://arxiv.org/pdf/2510.06059

On curvature estimates for four-dimensional gradient Ricci solitons
In this survey paper, we analyse and compare the recent curvature estimates for three types of $4$-dimensional gradient Ricci solitons, especially between Ricci shrinkers [58] and expanders [17]. In addition, we provide some new curvature estimates for $4$-dimensional gradient steady Ricci solitons, including the sharp curvature estimate $|Rm|\le C R$ for gradient steady Ricci solitons with positive Ricci curvature (see Theorem 1.1).

@arXiv_statML_bot@mastoxiv.page
2025-10-13 08:27:00

Gradient-Guided Furthest Point Sampling for Robust Training Set Selection
Morris Trestman, Stefan Gugler, Felix A. Faber, O. A. von Lilienfeld
https://arxiv.org/abs/2510.08906 h…

Gradient-Guided Furthest Point Sampling for Robust Training Set Selection
Smart training set selections procedures enable the reduction of data needs and improves predictive robustness in machine learning problems relevant to chemistry. We introduce Gradient Guided Furthest Point Sampling (GGFPS), a simple extension of Furthest Point Sampling (FPS) that leverages molecular force norms to guide efficient sampling of configurational spaces of molecules. Numerical evidence is presented for a toy-system (Styblinski-Tang function) as well as for molecular dynamics traject…

@arXiv_mathOC_bot@mastoxiv.page
2025-10-15 09:08:11

New Classes of Non-monotone Variational Inequality Problems Solvable via Proximal Gradient on Smooth Gap Functions
Lei Zhao, Daoli Zhu, Shuzhong Zhang
https://arxiv.org/abs/2510.12105

New Classes of Non-monotone Variational Inequality Problems Solvable via Proximal Gradient on Smooth Gap Functions
In this paper, we study the local linear convergence behavior of proximal-gradient (PG) descent algorithm on a parameterized gap-function reformulation of a smooth but non-monotone variational inequality problem (VIP). The aim is to solve the non-monotone VI problem without assuming the existence of a Minty-type solution. We first introduce and study various error bound conditions for the gap functions in relation to the VI model. In particular, we show that uniform type error bounds imply leve…

@arXiv_condmatquantgas_bot@mastoxiv.page
2025-10-14 08:56:58

Stable High-Order Vortices in Spin-Orbit-Coupled Spin-1 Bose-Einstein Condensates
Xin-Feng Zhang, Huan-Bo Luo, Josep Batle, Bin Liu, Yongyao Li
https://arxiv.org/abs/2510.09832 …

Stable High-Order Vortices in Spin-Orbit-Coupled Spin-1 Bose-Einstein Condensates
The present contribution explores phase transitions that occur in the ground state (GS) of spin-1 Bose-Einstein condensates (BECs) with spin-orbit coupling (SOC) under the action of gradient magnetic fields. By solving the corresponding linearized system in an exact fashion, we identify the conditions under which the GS phase transitions occur, thus transforming excited states into GS. The study of the full nonlinear system, including both density-density and spin-spin interactions, is …

@arXiv_astrophSR_bot@mastoxiv.page
2025-09-12 09:20:19

Rotational radial shear in the low solar photosphere. Direct detection from high-resolution spectro-imaging
T. Corbard (Universit\'e C\^ote d'Azur, Observatoire de la C\^ote d'Azur, CNRS, Laboratoire Lagrange, Nice, France), M. Faurobert (Universit\'e C\^ote d'Azur, Observatoire de la C\^ote d'Azur, CNRS, Laboratoire Lagrange, Nice, France), B. Gelly (CNRS-IRL2009, Tenerife, Spain), R. Douet (CNRS-IRL2009, Tenerife, Spain), D. Laforgue (CNRS-IRL2009, Tenerife, S…

Rotational radial shear in the low solar photosphere. Direct detection from high-resolution spectro-imaging
Radial differential rotation is an important factor in stellar dynamo theory. In the Sun, helioseismology has revealed a near surface shear layer in the upper 5 to 10% of the convection zone. At low to midlatitudes, the rotation velocity gradient decreases sharply near the surface. A depth gradient in rotational velocity was recently detected in the low photosphere using a differential interferometric method on spectroscopic data. Granular structures at different depths in the FeI 630.15 nm lin…

@arXiv_csCL_bot@mastoxiv.page
2025-10-15 10:38:41

SMEC: Rethinking Matryoshka Representation Learning for Retrieval Embedding Compression
Biao Zhang, Lixin Chen, Tong Liu, Bo Zheng
https://arxiv.org/abs/2510.12474 https://

SMEC: Rethinking Matryoshka Representation Learning for Retrieval Embedding Compression
Large language models (LLMs) generate high-dimensional embeddings that capture rich semantic and syntactic information. However, high-dimensional embeddings exacerbate computational complexity and storage requirements, thereby hindering practical deployment. To address these challenges, we propose a novel training framework named Sequential Matryoshka Embedding Compression (SMEC). This framework introduces the Sequential Matryoshka Representation Learning(SMRL) method to mitigate gradient varia…

@arXiv_condmatmtrlsci_bot@mastoxiv.page
2025-09-10 08:26:21

Effective Atom Theory: Gradient-Driven ab initio Materials Design
Justin Tahmassebpur, Brandon Li, Boris Barron, H\'ector Abru\~na, Peter Frazier, Tom\'as Arias
https://arxiv.org/abs/2509.07180

Effective Atom Theory: Gradient-Driven ab initio Materials Design
We introduce Effective Atom Theory (EAT), a framework that transforms combinatorial materials design into a smooth, gradient-driven optimization within density functional theory (DFT). Atoms are represented as probabilistic mixtures of elements, enabling gradient-based optimizers to converge to a physically realizable material in about 50 energy evaluations -- far fewer than combinatorial optimization methods. Applied to Co-Cr-Ni-V oxides for the alkaline oxygen evolution reaction (OER), EAT le…

@arXiv_statCO_bot@mastoxiv.page
2025-09-15 08:25:21

A Computable Measure of Suboptimality for Entropy-Regularised Variational Objectives
Cl\'ementine Chazal, Heishiro Kanagawa, Zheyang Shen, Anna Korba, Chris. J. Oates
https://arxiv.org/abs/2509.10393

A Computable Measure of Suboptimality for Entropy-Regularised Variational Objectives
Several emerging post-Bayesian methods target a probability distribution for which an entropy-regularised variational objective is minimised. This increased flexibility introduces a computational challenge, as one loses access to an explicit unnormalised density for the target. To mitigate this difficulty, we introduce a novel measure of suboptimality called 'gradient discrepancy', and in particular a 'kernel gradient discrepancy' (KGD) that can be explicitly computed. In the standard Bayesian …

@arXiv_condmatmeshall_bot@mastoxiv.page
2025-10-09 09:32:21

Thermal gradient-driven skyrmion dynamics with near-zero skyrmion Hall angle
Yogesh Kumar, Hurmal Saren, Pintu Das
https://arxiv.org/abs/2510.07020 https://

Thermal gradient-driven skyrmion dynamics with near-zero skyrmion Hall angle
Thermal gradient driven skyrmion dynamics offers a promising route toward green spintronics, enabling the utilization of waste heat for information transport and processing. Using micromagnetic simulations, we investigate Neel skyrmions in a Co-Pt bilayer nanoracetrack and demonstrate that stochastic torques induced by a thermal gradient drive skyrmion motion toward the hotter region with a nearly vanishing Hall angle. The dynamics depends sensitively on intrinsic material parameters - the skyr…

@arXiv_csLG_bot@mastoxiv.page
2025-09-10 10:25:51

GCond: Gradient Conflict Resolution via Accumulation-based Stabilization for Large-Scale Multi-Task Learning
Evgeny Alves Limarenko, Anastasiia Alexandrovna Studenikina
https://arxiv.org/abs/2509.07252

GCond: Gradient Conflict Resolution via Accumulation-based Stabilization for Large-Scale Multi-Task Learning
In multi-task learning (MTL), gradient conflict poses a significant challenge. Effective methods for addressing this problem, including PCGrad, CAGrad, and GradNorm, in their original implementations are computationally demanding, which significantly limits their application in modern large models and transformers. We propose Gradient Conductor (GCond), a method that builds upon PCGrad principles by combining them with gradient accumulation and an adaptive arbitration mechanism. We evaluated GC…

@arXiv_hepph_bot@mastoxiv.page
2025-10-14 19:27:00

Replaced article(s) found for hep-ph. https://arxiv.org/list/hep-ph/new
[1/2]:
- Simple Gradient Flow Equation for the Bounce Solution
Ryosuke Sato
https://

@arXiv_csCV_bot@mastoxiv.page
2025-10-14 22:04:05

Replaced article(s) found for cs.CV. https://arxiv.org/list/cs.CV/new
[4/8]:
- Boosting Adversarial Transferability via Commonality-Oriented Gradient Optimization
Yanting Gao, Yepeng Liu, Junming Liu, Qi Zhang, Hongyun Zhang, Duoqian Miao, Cairong Zhao

@arXiv_condmatstatmech_bot@mastoxiv.page
2025-10-10 07:57:48

Thermodynamically Consistent Continuum Theory of Magnetic Particles in High-Gradient Fields
Marko Tesanovic, Daniel M. Markiewitz, Marcus L. Popp, Martin Z. Bazant, Sonja Berensmeier
https://arxiv.org/abs/2510.07552

Thermodynamically Consistent Continuum Theory of Magnetic Particles in High-Gradient Fields
Magnetic particles underpin a broad range of technologies, from water purification and mineral processing to bioseparations and targeted drug delivery. The dynamics of magnetic particles in high-gradient magnetic fields-encompassing both their transport and eventual capture-arise from the coupled interplay of field-driven drift, fluid advection, and particle-field feedback. These processes remain poorly captured by existing models relying on empirical closures or discrete particle tracking. Her…

@arXiv_csCR_bot@mastoxiv.page
2025-10-09 09:21:21

Reading Between the Lines: Towards Reliable Black-box LLM Fingerprinting via Zeroth-order Gradient Estimation
Shuo Shao, Yiming Li, Hongwei Yao, Yifei Chen, Yuchen Yang, Zhan Qin
https://arxiv.org/abs/2510.06605

Reading Between the Lines: Towards Reliable Black-box LLM Fingerprinting via Zeroth-order Gradient Estimation
The substantial investment required to develop Large Language Models (LLMs) makes them valuable intellectual property, raising significant concerns about copyright protection. LLM fingerprinting has emerged as a key technique to address this, which aims to verify a model's origin by extracting an intrinsic, unique signature (a "fingerprint") and comparing it to that of a source model to identify illicit copies. However, existing black-box fingerprinting methods often fail to generate distinctiv…

@arXiv_statML_bot@mastoxiv.page
2025-10-15 09:29:01

Active Subspaces in Infinite Dimension
Poorbita Kundu, Nathan Wycoff
https://arxiv.org/abs/2510.11871 https://arxiv.org/pdf/2510.11871

Active Subspaces in Infinite Dimension
Active subspace analysis uses the leading eigenspace of the gradient's second moment to conduct supervised dimension reduction. In this article, we extend this methodology to real-valued functionals on Hilbert space. We define an operator which coincides with the active subspace matrix when applied to a Euclidean space. We show that many of the desirable properties of Active Subspace analysis extend directly to the infinite dimensional setting. We also propose a Monte Carlo procedure and discus…

@arXiv_mathNA_bot@mastoxiv.page
2025-10-15 09:51:12

On the maximum bound principle and energy dissipation of exponential time differencing methods for the chiral liquid crystal blue phases
Wenshuai Hu, Guanghua Ji
https://arxiv.org/abs/2510.12499

On the maximum bound principle and energy dissipation of exponential time differencing methods for the chiral liquid crystal blue phases
The blue phases are fascinating and complex states of chiral liquid crystals which can be modeled by a comprehensive framework of the Landau-de theory, satisfying energy dissipation and maximum bound principle. In this paper, we develop and analyze first and second order exponential time differencing numerical schemes for the gradient flow of the chiral liquid crystal blue phases, which preserve the maximum bound principle and energy dissipation unconditionally at the semi-discrete level. T…

@arXiv_mathPR_bot@mastoxiv.page
2025-09-09 11:12:42

Infinite Interacting Brownian Motions and EVI Gradient Flows
Kohei Suzuki
https://arxiv.org/abs/2509.06869 https://arxiv.org/pdf/2509.06869

Infinite Interacting Brownian Motions and EVI Gradient Flows
We provide a sufficient condition under which the time marginal of the law of $μ$-symmetric diffusion process $\mathsf{X}$ in the infinite dimensional configuration space $\mathbf Υ$ is the unique Wasserstein $\mathsf{W}_{2, \mathsf{d}_\mathbfΥ}$ $\mathsf{EVI}$-gradient flow of the relative entropy (a.k.a. Kullback-Leibler divergence) $\mathrm{Ent}_μ$ on the space $P(\mathbf Υ)$ of probability measures on $\mathbf Υ$. Here, $\mathbf Υ$ is equipped with the $\ell^2$-matching extended dist…

@arXiv_mathOC_bot@mastoxiv.page
2025-11-14 09:50:00

(Adaptive) Scaled gradient methods beyond locally Holder smoothness: Lyapunov analysis, convergence rate and complexity
Susan Ghaderi, Morteza Rahimi, Yves Moreau, Masoud Ahookhosh
https://arxiv.org/abs/2511.10425 https://arxiv.org/pdf/2511.10425 https://arxiv.org/html/2511.10425
arXiv:2511.10425v1 Announce Type: new
Abstract: This paper addresses the unconstrained minimization of smooth convex functions whose gradients are locally Holder continuous. Building on these results, we analyze the Scaled Gradient Algorithm (SGA) under local smoothness assumptions, proving its global convergence and iteration complexity. Furthermore, under local strong convexity and the Kurdyka-Lojasiewicz (KL) inequality, we establish linear convergence rates and provide explicit complexity bounds. In particular, we show that when the gradient is locally Lipschitz continuous, SGA attains linear convergence for any KL exponent. We then introduce and analyze an adaptive variant of SGA (AdaSGA), which automatically adjusts the scaling and step-size parameters. For this method, we show global convergence, and derive local linear rates under strong convexity.
toXiv_bot_toot

@arXiv_astrophGA_bot@mastoxiv.page
2025-10-13 08:59:10

Galaxy Metallicity Gradients in the Reionization Epoch from the FIRE-2 Simulations
Xunda Sun, Xin Wang, Fangzhou Jiang, Houjun Mo, Luis C. Ho, Qianqiao Zhou, Xiangcheng Ma, Hu Zhan, Andrew Wetzel, Russell L. Graf, Philip F. Hopkins, Dusan Keres, Jonathan Stern
https://arxiv.org/abs/2510.08997

Galaxy Metallicity Gradients in the Reionization Epoch from the FIRE-2 Simulations
We employ the high-redshift suite of FIRE-2 cosmological hydrodynamic zoom-in simulations to investigate the evolution of gas-phase metallicity radial gradients in galaxies in the epoch of reionization (EoR). Our sample consists of 22 galaxies spanning the redshift range $z \sim 10-5$. We find that galaxies at $z\sim10$ exhibit a median metallicity gradient of $-0.15\,\mathrm{dex\cdot kpc^{-1}}$ with substantial scatter, which gradually flatten to $-0.1\,\mathrm{dex\cdot kpc^{-1}}$ at $z\sim6$,…

@arXiv_astrophIM_bot@mastoxiv.page
2025-10-14 11:02:08

Argus: JAX state-space filtering for gravitational wave detection with a pulsar timing array
Tom Kimpson, Nicholas J. O'Neill, Patrick M. Meyers, Andrew Melatos
https://arxiv.org/abs/2510.11077

Argus: JAX state-space filtering for gravitational wave detection with a pulsar timing array
Argus is a high-performance Python package for detecting and characterising nanohertz gravitational waves in pulsar timing array data. The package provides a complete Bayesian inference framework based on state-space models, using Kalman filtering for efficient likelihood evaluation. Argus leverages JAX for just-in-time compilation, GPU acceleration, and automatic differentiation, facilitating rapid Bayesian inference with gradient-based samplers. The state-space approach provides a computation…

@arXiv_condmatstrel_bot@mastoxiv.page
2025-10-15 08:02:11

Evidence for easy-plane XY ferromagnetism in heavy-fermion quantum-critical CeRh6Ge4
Riku Yamamoto, Sejun Park, Zachary W. Riedel, Phurba Sherpa, Joe D. Thompson, Filip Ronning, Eric D. Bauer, Adam P. Dioguardi, Michihiro Hirata
https://arxiv.org/abs/2510.12006

Evidence for easy-plane XY ferromagnetism in heavy-fermion quantum-critical CeRh6Ge4
We report $^{73}$Ge nuclear quadrupole resonance (NQR) and magnetic resonance (NMR) spectroscopy in the heavy-fermion quantum-critical ferromagnet CeRh$_6$Ge$_4$. NQR and NMR spectral measurements at the two non-equivalent Ge sites reveal electric field gradient tensors and the directions of their principal axes relative to the hexagonal basal plane. The spin-lattice relaxation rate $1/T_1$ experiments reveal a clear critical slowing down approaching the ferromagnetic transition. $1/T_1$ in the…

@arXiv_csLG_bot@mastoxiv.page
2025-09-15 09:31:21

Data-Driven Energy Estimation for Virtual Servers Using Combined System Metrics and Machine Learning
Amandip Sangha
https://arxiv.org/abs/2509.09991 https://

Data-Driven Energy Estimation for Virtual Servers Using Combined System Metrics and Machine Learning
This paper presents a machine learning-based approach to estimate the energy consumption of virtual servers without access to physical power measurement interfaces. Using resource utilization metrics collected from guest virtual machines, we train a Gradient Boosting Regressor to predict energy consumption measured via RAPL on the host. We demonstrate, for the first time, guest-only resource-based energy estimation without privileged host access with experiments across diverse workloads, achiev…

@arXiv_mathOC_bot@mastoxiv.page
2025-11-14 09:19:00

Global Convergence of Four-Layer Matrix Factorization under Random Initialization
Minrui Luo, Weihang Xu, Xiang Gao, Maryam Fazel, Simon Shaolei Du
https://arxiv.org/abs/2511.09925 https://arxiv.org/pdf/2511.09925 https://arxiv.org/html/2511.09925
arXiv:2511.09925v1 Announce Type: new
Abstract: Gradient descent dynamics on the deep matrix factorization problem is extensively studied as a simplified theoretical model for deep neural networks. Although the convergence theory for two-layer matrix factorization is well-established, no global convergence guarantee for general deep matrix factorization under random initialization has been established to date. To address this gap, we provide a polynomial-time global convergence guarantee for randomly initialized gradient descent on four-layer matrix factorization, given certain conditions on the target matrix and a standard balanced regularization term. Our analysis employs new techniques to show saddle-avoidance properties of gradient decent dynamics, and extends previous theories to characterize the change in eigenvalues of layer weights.
toXiv_bot_toot

@arXiv_mathDG_bot@mastoxiv.page
2025-10-09 08:19:10

Stability of asymptotically conical gradient K\"ahler-Ricci expanders
Longteng Chen
https://arxiv.org/abs/2510.06850 https://arxiv.org/pdf/2510.06850

Stability of asymptotically conical gradient Kähler-Ricci expanders
In this work, we consider a perturbation of an asymptotically conical gradient expanding Kähler-Ricci soliton metric $g$ in the same Kähler class. We demonstrate that, under suitable assumptions, the normalized Kähler-Ricci flow starting from the initial perturbed metric exists for all time and converges uniformly to an asymptotically conical gradient expanding Kähler-Ricci soliton metric $g_\infty$. Moreover, if the perturbed initial metric is asymptotic to $g$ at spatial infinity, then th…

@arXiv_physicsfludyn_bot@mastoxiv.page
2025-09-10 08:36:21

Decoupling pressure gradient history effects in turbulent boundary layers through high-Reynolds number experiments
Ahmad Zarei, Mitchell Lozier, Rahul Deshpande, Ivan Marusic
https://arxiv.org/abs/2509.07545

Decoupling pressure gradient history effects in turbulent boundary layers through high-Reynolds number experiments
This study provides a carefully controlled examination of the universality of the von Karman and additive constants associated with the classical logarithmic scaling of the mean streamwise velocity profile in high-friction Reynolds number (Re_tau) turbulent boundary layers (TBLs) subjected to weak-to-moderate adverse pressure gradients (APGs). The analysis leverages a recently developed method for imposing APGs with minimal pressure gradient (PG) history effects in Melbourne's high-Re_tau TBL f…

@arXiv_csLG_bot@mastoxiv.page
2025-09-15 09:57:11

Understanding Outer Optimizers in Local SGD: Learning Rates, Momentum, and Acceleration
Ahmed Khaled, Satyen Kale, Arthur Douillard, Chi Jin, Rob Fergus, Manzil Zaheer
https://arxiv.org/abs/2509.10439 …

Understanding Outer Optimizers in Local SGD: Learning Rates, Momentum, and Acceleration
Modern machine learning often requires training with large batch size, distributed data, and massively parallel compute hardware (like mobile and other edge devices or distributed data centers). Communication becomes a major bottleneck in such settings but methods like Local Stochastic Gradient Descent (Local SGD) show great promise in reducing this additional communication overhead. Local SGD consists of three parts: a local optimization process, an aggregation mechanism, and an outer optimize…

@arXiv_mathAP_bot@mastoxiv.page
2025-09-15 08:28:51

Homogenization of rate-independent elastoplastic spring network models with non-local random fields
Simone Hermann
https://arxiv.org/abs/2509.09872 https://

Homogenization of rate-independent elastoplastic spring network models with non-local random fields
We investigate the time-evolution of elastoplastic materials reinforced by randomly distributed long-range interactions. Starting from a rate-independent system on a discrete spring lattice that combines local linearized elasticity, gradient-regularized plasticity and stochastic non-local links modeling stiff fibers, we establish a discrete-to-continuum limit in the energetic formulation. We prove that as the lattice spacing tends to zero, an evolutionary solution of the discrete system converg…

@arXiv_csLG_bot@mastoxiv.page
2025-10-14 13:41:28

Boundary-Guided Policy Optimization for Memory-efficient RL of Diffusion Large Language Models
Nianyi Lin, Jiajie Zhang, Lei Hou, Juanzi Li
https://arxiv.org/abs/2510.11683 http…

Boundary-Guided Policy Optimization for Memory-efficient RL of Diffusion Large Language Models
A key challenge in applying reinforcement learning (RL) to diffusion large language models (dLLMs) lies in the intractability of their likelihood functions, which are essential for the RL objective, necessitating corresponding approximation in each training step. While existing methods approximate the log-likelihoods by their evidence lower bounds (ELBOs) via customized Monte Carlo (MC) sampling, the forward computational graphs of all MC samples need to be retained for the gradient computation…

@arXiv_csCR_bot@mastoxiv.page
2025-09-09 11:56:52

PLRV-O: Advancing Differentially Private Deep Learning via Privacy Loss Random Variable Optimization
Qin Yang, Nicholas Stout, Meisam Mohammady, Han Wang, Ayesha Samreen, Christopher J Quinn, Yan Yan, Ashish Kundu, Yuan Hong
https://arxiv.org/abs/2509.06264

PLRV-O: Advancing Differentially Private Deep Learning via Privacy Loss Random Variable Optimization
Differentially Private Stochastic Gradient Descent (DP-SGD) is a standard method for enforcing privacy in deep learning, typically using the Gaussian mechanism to perturb gradient updates. However, conventional mechanisms such as Gaussian and Laplacian noise are parameterized only by variance or scale. This single degree of freedom ties the magnitude of noise directly to both privacy loss and utility degradation, preventing independent control of these two factors. The problem becomes more pron…

@arXiv_mathOC_bot@mastoxiv.page
2025-11-14 10:01:50

Low-Discrepancy Set Post-Processing via Gradient Descent
Fran\c{c}ois Cl\'ement, Linhang Huang, Woorim Lee, Cole Smidt, Braeden Sodt, Xuan Zhang
https://arxiv.org/abs/2511.10496 https://arxiv.org/pdf/2511.10496 https://arxiv.org/html/2511.10496
arXiv:2511.10496v1 Announce Type: new
Abstract: The construction of low-discrepancy sets, used for uniform sampling and numerical integration, has recently seen great improvements based on optimization and machine learning techniques. However, these methods are computationally expensive, often requiring days of computation or access to GPU clusters. We show that simple gradient descent-based techniques allow for comparable results when starting with a reasonably uniform point set. Not only is this method much more efficient and accessible, but it can be applied as post-processing to any low-discrepancy set generation method for a variety of standard discrepancy measures.
toXiv_bot_toot

@arXiv_quantph_bot@mastoxiv.page
2025-10-08 10:24:49

Hybrid Quantum-Classical Policy Gradient for Adaptive Control of Cyber-Physical Systems: A Comparative Study of VQC vs. MLP
Aueaphum Aueawatthanaphisut, Nyi Wunna Tun
https://arxiv.org/abs/2510.06010

Hybrid Quantum-Classical Policy Gradient for Adaptive Control of Cyber-Physical Systems: A Comparative Study of VQC vs. MLP
The comparative evaluation between classical and quantum reinforcement learning (QRL) paradigms was conducted to investigate their convergence behavior, robustness under observational noise, and computational efficiency in a benchmark control environment. The study employed a multilayer perceptron (MLP) agent as a classical baseline and a parameterized variational quantum circuit (VQC) as a quantum counterpart, both trained on the CartPole-v1 environment over 500 episodes. Empirical results dem…

@arXiv_mathNA_bot@mastoxiv.page
2025-10-10 08:55:59

Stochastic Gradient Descent for Incomplete Tensor Linear Systems
Anna Ma, Deanna Needell, Alexander Xue
https://arxiv.org/abs/2510.07630 https://arxiv.org/…

Stochastic Gradient Descent for Incomplete Tensor Linear Systems
Solving large tensor linear systems poses significant challenges due to the high volume of data stored, and it only becomes more challenging when some of the data is missing. Recently, Ma et al. showed that this problem can be tackled using a stochastic gradient descent-based method, assuming that the missing data follows a uniform missing pattern. We adapt the technique by modifying the update direction, showing that the method is applicable under other missing data models. We prove convergenc…

@arXiv_mathAP_bot@mastoxiv.page
2025-09-15 07:53:21

Zeroes of Eigenfunctions of Schr\"odinger Operators after Schwartzman
Willie Wai-Yeung Wong
https://arxiv.org/abs/2509.09739 https://arxiv.org/pdf/250…

Zeroes of Eigenfunctions of Schrödinger Operators after Schwartzman
Consider a complete, connected, smooth, oriented Riemannian manifold $(M,g)$ with boundary, such that the first Betti number vanishes. Sol Schwartzman proved that for Schrödinger operators of the form $-Δ_g + V$ where $\Im(V)$ is signed, if $f: M\to\mathbb{C}$ is a non-vanishing element of its kernel, then $f$ has constant phase. The proof relied on dynamical systems methods applied to the gradient flow of the phase of $f$. In this manuscript we provide a more direct PDE argument that proves …

@arXiv_mathDG_bot@mastoxiv.page
2025-10-07 10:16:22

Curvature pinching of asymptotically conical gradient expanding Ricci solitons
Huai-Dong Cao, Junming Xie
https://arxiv.org/abs/2510.05075 https://arxiv.or…

Curvature pinching of asymptotically conical gradient expanding Ricci solitons
Since the well-known work of Hamilton [62] and Ivey [64], the Hamilton-Ivey curvature pinching and its generalizations have become a signature feature of gradient shrinking and steady Ricci solitons, and more generally, of ancient solutions to the Ricci flow. However, analogous results for gradient expanding Ricci solitons have remained elusive. This is largely due to the fact that the proofs of existing curvature pinching estimates crucially rely on shrinking and steady solitons being ancient,…

@arXiv_mathOC_bot@mastoxiv.page
2025-11-14 09:37:10

S-D-RSM: Stochastic Distributed Regularized Splitting Method for Large-Scale Convex Optimization Problems
Maoran Wang, Xingju Cai, Yongxin Chen
https://arxiv.org/abs/2511.10133 https://arxiv.org/pdf/2511.10133 https://arxiv.org/html/2511.10133
arXiv:2511.10133v1 Announce Type: new
Abstract: This paper investigates the problems large-scale distributed composite convex optimization, with motivations from a broad range of applications, including multi-agent systems, federated learning, smart grids, wireless sensor networks, compressed sensing, and so on. Stochastic gradient descent (SGD) and its variants are commonly employed to solve such problems. However, existing algorithms often rely on vanishing step sizes, strong convexity assumptions, or entail substantial computational overhead to ensure convergence or obtain favorable complexity. To bridge the gap between theory and practice, we integrate consensus optimization and operator splitting techniques (see Problem Reformulation) to develop a novel stochastic splitting algorithm, termed the \emph{stochastic distributed regularized splitting method} (S-D-RSM). In practice, S-D-RSM performs parallel updates of proximal mappings and gradient information for only a randomly selected subset of agents at each iteration. By introducing regularization terms, it effectively mitigates consensus discrepancies among distributed nodes. In contrast to conventional stochastic methods, our theoretical analysis establishes that S-D-RSM achieves global convergence without requiring diminishing step sizes or strong convexity assumptions. Furthermore, it achieves an iteration complexity of $\mathcal{O}(1/\epsilon)$ with respect to both the objective function value and the consensus error. Numerical experiments show that S-D-RSM achieves up to 2--3$\times$ speedup compared to state-of-the-art baselines, while maintaining comparable or better accuracy. These results not only validate the algorithm's theoretical guarantees but also demonstrate its effectiveness in practical tasks such as compressed sensing and empirical risk minimization.
toXiv_bot_toot

@arXiv_mathAP_bot@mastoxiv.page
2025-09-10 08:39:51

Gradient Flows of Interfacial Energies: Curvature Agents and Incompressibility
Keith Promislow, Truong Vu, Brian Wetton
https://arxiv.org/abs/2509.07380 https://

Gradient Flows of Interfacial Energies: Curvature Agents and Incompressibility
We present a framework for the gradient flow of sharp-interface surface energies that couple to embedded curvature active agents. We use a penalty method to develop families of locally incompressible gradient flows that couple interface stretching or compression to local flux of interfacial mass. We establish the convergence of the penalty method to an incompressible flow both formally for a broad family of surface energies and rigorously for a more narrow class of surface energies. We present …

@arXiv_statML_bot@mastoxiv.page
2025-10-08 09:29:19

On the Theory of Continual Learning with Gradient Descent for Neural Networks
Hossein Taheri, Avishek Ghosh, Arya Mazumdar
https://arxiv.org/abs/2510.05573 https://

On the Theory of Continual Learning with Gradient Descent for Neural Networks
Continual learning, the ability of a model to adapt to an ongoing sequence of tasks without forgetting the earlier ones, is a central goal of artificial intelligence. To shed light on its underlying mechanisms, we analyze the limitations of continual learning in a tractable yet representative setting. In particular, we study one-hidden-layer quadratic neural networks trained by gradient descent on an XOR cluster dataset with Gaussian noise, where different tasks correspond to different clusters…

@arXiv_csLG_bot@mastoxiv.page
2025-10-10 11:18:19

On the optimization dynamics of RLVR: Gradient gap and step size thresholds
Joe Suk, Yaqi Duan
https://arxiv.org/abs/2510.08539 https://arxiv.org/pdf/2510.…

On the optimization dynamics of RLVR: Gradient gap and step size thresholds
Reinforcement Learning with Verifiable Rewards (RLVR), which uses simple binary feedback to post-train large language models, has shown significant empirical success. However, a principled understanding of why it works has been lacking. This paper builds a theoretical foundation for RLVR by analyzing its training process at both the full-response (trajectory) and token levels. Central to our analysis is a quantity called the Gradient Gap, which formalizes the direction of improvement from low-r…

@arXiv_mathOC_bot@mastoxiv.page
2025-11-14 13:23:10

Replaced article(s) found for math.OC. https://arxiv.org/list/math.OC/new
[1/1]:
- A robust BFGS algorithm for unconstrained nonlinear optimization problems
Yaguang Yang
https://arxiv.org/abs/1212.5929
- Quantum computing and the stable set problem
Alja\v{z} Krpan, Janez Povh, Dunja Pucher
https://arxiv.org/abs/2405.12845 https://mastoxiv.page/@arXiv_mathOC_bot/112483516437815686
- Mean Field Game with Reflected Jump Diffusion Dynamics: A Linear Programming Approach
Zongxia Liang, Xiang Yu, Keyu Zhang
https://arxiv.org/abs/2508.20388 https://mastoxiv.page/@arXiv_mathOC_bot/115111048711698998
- Differential Dynamic Programming for the Optimal Control Problem with an Ellipsoidal Target Set a...
Sungjun Eom, Gyunghoon Park
https://arxiv.org/abs/2509.07546 https://mastoxiv.page/@arXiv_mathOC_bot/115179281556444440
- On the Moreau envelope properties of weakly convex functions
Marien Renaud, Arthur Leclaire, Nicolas Papadakis
https://arxiv.org/abs/2509.13960 https://mastoxiv.page/@arXiv_mathOC_bot/115224514482363803
- Automated algorithm design via Nevanlinna-Pick interpolation
Ibrahim K. Ozaslan, Tryphon T. Georgiou, Mihailo R. Jovanovic
https://arxiv.org/abs/2509.21416 https://mastoxiv.page/@arXiv_mathOC_bot/115286533597711930
- Optimal Control of a Bioeconomic Crop-Energy System with Energy Reinvestment
Othman Cherkaoui Dekkaki
https://arxiv.org/abs/2510.11381 https://mastoxiv.page/@arXiv_mathOC_bot/115372322896073250
- Point Convergence Analysis of the Accelerated Gradient Method for Multiobjective Optimization: Co...
Yingdong Yin
https://arxiv.org/abs/2510.26382 https://mastoxiv.page/@arXiv_mathOC_bot/115468018035252078
- History-Aware Adaptive High-Order Tensor Regularization
Chang He, Bo Jiang, Yuntian Jiang, Chuwen Zhang, Shuzhong Zhang
https://arxiv.org/abs/2511.05788
- Equivalence of entropy solutions and gradient flows for pressureless 1D Euler systems
Jos\'e Antonio Carrillo, Sondre Tesdal Galtung
https://arxiv.org/abs/2312.04932 https://mastoxiv.page/@arXiv_mathAP_bot/111560077272113052
- Kernel Modelling of Fading Memory Systems
Yongkang Huo, Thomas Chaffey, Rodolphe Sepulchre
https://arxiv.org/abs/2403.11945 https://mastoxiv.page/@arXiv_eessSY_bot/112121123836064435
- The Maximum Theoretical Ground Speed of the Wheeled Vehicle
Altay Zhakatayev, Mukatai Nemerebayev
https://arxiv.org/abs/2502.15341 https://mastoxiv.page/@arXiv_physicsclassph_bot/114057765769441123
- Hessian stability and convergence rates for entropic and Sinkhorn potentials via semiconcavity
Giacomo Greco, Luca Tamanini
https://arxiv.org/abs/2504.11133 https://mastoxiv.page/@arXiv_mathPR_bot/114346453424694503
- Optimizing the ground state energy of the three-dimensional magnetic Dirichlet Laplacian with con...
Matthias Baur
https://arxiv.org/abs/2504.21597 https://mastoxiv.page/@arXiv_mathph_bot/114431404740241516
- A localized consensus-based sampling algorithm
Arne Bouillon, Alexander Bodard, Panagiotis Patrinos, Dirk Nuyens, Giovanni Samaey
https://arxiv.org/abs/2505.24861 https://mastoxiv.page/@arXiv_mathNA_bot/114612580684567066
- A Novel Sliced Fused Gromov-Wasserstein Distance
Moritz Piening, Robert Beinert
https://arxiv.org/abs/2508.02364 https://mastoxiv.page/@arXiv_csLG_bot/114976243138728278
- Minimal Regret Walras Equilibria for Combinatorial Markets via Duality, Integrality, and Sensitiv...
Alo\"is Duguet, Tobias Harks, Martin Schmidt, Julian Schwarz
https://arxiv.org/abs/2511.09021 https://mastoxiv.page/@arXiv_csGT_bot/115541243299714775
toXiv_bot_toot

@arXiv_csCV_bot@mastoxiv.page
2025-09-09 12:29:42

Evaluating the Impact of Adversarial Attacks on Traffic Sign Classification using the LISA Dataset
Nabeyou Tadessa, Balaji Iyangar, Mashrur Chowdhury
https://arxiv.org/abs/2509.06835

Evaluating the Impact of Adversarial Attacks on Traffic Sign Classification using the LISA Dataset
Adversarial attacks pose significant threats to machine learning models by introducing carefully crafted perturbations that cause misclassification. While prior work has primarily focused on MNIST and similar datasets, this paper investigates the vulnerability of traffic sign classifiers using the LISA Traffic Sign dataset. We train a convolutional neural network to classify 47 different traffic signs and evaluate its robustness against Fast Gradient Sign Method (FGSM) and Projected Gradient De…

@arXiv_statML_bot@mastoxiv.page
2025-10-07 10:11:52

Computing Wasserstein Barycenters through Gradient Flows
Eduardo Fernandes Montesuma, Yassir Bendou, Mike Gartrell
https://arxiv.org/abs/2510.04602 https://

Computing Wasserstein Barycenters through Gradient Flows
Wasserstein barycenters provide a powerful tool for aggregating probability measures, while leveraging the geometry of their ambient space. Existing discrete methods suffer from poor scalability, as they require access to the complete set of samples from input measures. We address this issue by recasting the original barycenter problem as a gradient flow in the Wasserstein space. Our approach offers two advantages. First, we achieve scalability by sampling mini-batches from the input measures. …

@arXiv_mathDG_bot@mastoxiv.page
2025-10-10 08:47:49

Asymptotic behaviour of the weak inverse anisotropic mean curvature flow
Chaoqun Gao, Yong Wei, Rong Zhou
https://arxiv.org/abs/2510.08168 https://arxiv.or…

Asymptotic behaviour of the weak inverse anisotropic mean curvature flow
We first establish a local gradient estimate for anisotropic $p$-harmonic functions. A key feature of our estimate is that the constant remains bounded as $p\to 1$; consequently, in the limit $p\to 1$, this estimate yields the local gradient estimate for weak solutions of the inverse anisotropic mean curvature flow (IAMCF). As an application, we show that the weak IAMCF is asymptotic to the expanding Wulff shape solution at the infinity, thereby extending the result of Huisken and Ilmanen in [7…

@arXiv_csLG_bot@mastoxiv.page
2025-10-13 10:44:10

Weight Initialization and Variance Dynamics in Deep Neural Networks and Large Language Models
Yankun Han
https://arxiv.org/abs/2510.09423 https://arxiv.org…

Weight Initialization and Variance Dynamics in Deep Neural Networks and Large Language Models
Weight initialization governs signal propagation and gradient flow at the start of training. This paper offers a theory-grounded and empirically validated study across two regimes: compact ReLU multilayer perceptrons and GPT-2-style transformers. First, a logarithmic sweep of the initial standard deviation maps vanishing and exploding regimes and identifies a broad stability band with standard deviations between 1e-2 and 1e-1. Second, a controlled comparison shows that Kaiming (fan-in) initiali…

@arXiv_mathOC_bot@mastoxiv.page
2025-10-15 08:44:02

Linear Convergence of a Unified Primal--Dual Algorithm for Convex--Concave Saddle Point Problems with Quadratic Growth
Cody Melcher, Afrooz Jalilzadeh, Erfan Yazdandoost Hamedani
https://arxiv.org/abs/2510.11990

Linear Convergence of a Unified Primal--Dual Algorithm for Convex--Concave Saddle Point Problems with Quadratic Growth
In this paper, we study saddle point (SP) problems, focusing on convex-concave optimization involving functions that satisfy either two-sided quadratic functional growth (QFG) or two-sided quadratic gradient growth (QGG)--novel conditions tailored specifically for SP problems as extensions of quadratic growth conditions in minimization. These conditions relax the traditional requirement of strong convexity-strong concavity, thereby encompassing a broader class of problems. We propose a generali…

@arXiv_mathAP_bot@mastoxiv.page
2025-09-10 09:42:51

A gradient estimate for the linearized translator equation
Kyeongsu Choi, Robert Haslhofer, Or Hershkovits
https://arxiv.org/abs/2509.07629 https://arxiv.o…

A gradient estimate for the linearized translator equation
In this paper, we develop some analytic foundations for the linearized translator equation in $\mathbb{R}^4$, i.e. in the first dimension where the Bernstein property fails. This equation governs how the (noncompact) singularity models of the mean curvature flow in $\mathbb{R}^4$ fit together in a common moduli space. Here, we prove a gradient estimate, which gives a sharp bound for $W_v$, namely for the derivative of the variation field $W$ in the tip region. This serves as a substitute for th…

@arXiv_mathOC_bot@mastoxiv.page
2025-10-14 11:57:39

Accelerated stochastic first-order method for convex optimization under heavy-tailed noise
Chuan He, Zhaosong Lu
https://arxiv.org/abs/2510.11676 https://a…

Accelerated stochastic first-order method for convex optimization under heavy-tailed noise
We study convex composite optimization problems, where the objective function is given by the sum of a prox-friendly function and a convex function whose subgradients are estimated under heavy-tailed noise. Existing work often employs gradient clipping or normalization techniques in stochastic first-order methods to address heavy-tailed noise. In this paper, we demonstrate that a vanilla stochastic algorithm -- without additional modifications such as clipping or normalization -- can achieve op…

@arXiv_mathOC_bot@mastoxiv.page
2025-10-14 10:13:18

Linear Algebra Problems Solved by Using Damped Dynamical Systems on the Stiefel Manifold
M Gulliksson, A Oleynik, M Ogren, R Bakhshandeh-Chamazkoti
https://arxiv.org/abs/2510.10535

Linear Algebra Problems Solved by Using Damped Dynamical Systems on the Stiefel Manifold
We develop a new method for solving minimization problems on the Stiefel Manifold using damped dynamical systems. The constraints are satisfied in the limit by an additional damped dynamical system. The method is illustrated by numerical experiments and compared to a state-of-the-art conjugate gradient method.

@arXiv_csLG_bot@mastoxiv.page
2025-10-08 10:38:29

NeST-BO: Fast Local Bayesian Optimization via Newton-Step Targeting of Gradient and Hessian Information
Wei-Ting Tang, Akshay Kudva, Joel A. Paulson
https://arxiv.org/abs/2510.05516

NeST-BO: Fast Local Bayesian Optimization via Newton-Step Targeting of Gradient and Hessian Information
Bayesian optimization (BO) is effective for expensive black-box problems but remains challenging in high dimensions. We propose NeST-BO, a local BO method that targets the Newton step by jointly learning gradient and Hessian information with Gaussian process surrogates, and selecting evaluations via a one-step lookahead bound on Newton-step error. We show that this bound (and hence the step error) contracts with batch size, so NeST-BO directly inherits inexact-Newton convergence: global progres…

@arXiv_statML_bot@mastoxiv.page
2025-09-30 09:30:11

Statistical Inference for Gradient Boosting Regression
Haimo Fang, Kevin Tan, Giles Hooker
https://arxiv.org/abs/2509.23127 https://arxiv.org/pdf/2509.2312…

Statistical Inference for Gradient Boosting Regression
Gradient boosting is widely popular due to its flexibility and predictive accuracy. However, statistical inference and uncertainty quantification for gradient boosting remain challenging and under-explored. We propose a unified framework for statistical inference in gradient boosting regression. Our framework integrates dropout or parallel training with a recently proposed regularization procedure that allows for a central limit theorem (CLT) for boosting. With these enhancements, we surprising…

@arXiv_mathOC_bot@mastoxiv.page
2025-10-15 09:36:42

Learning Mean-Field Games through Mean-Field Actor-Critic Flow
Mo Zhou, Haosheng Zhou, Ruimeng Hu
https://arxiv.org/abs/2510.12180 https://arxiv.org/pdf/25…

Learning Mean-Field Games through Mean-Field Actor-Critic Flow
We propose the Mean-Field Actor-Critic (MFAC) flow, a continuous-time learning dynamics for solving mean-field games (MFGs), combining techniques from reinforcement learning and optimal transport. The MFAC framework jointly evolves the control (actor), value function (critic), and distribution components through coupled gradient-based updates governed by partial differential equations (PDEs). A central innovation is the Optimal Transport Geodesic Picard (OTGP) flow, which drives the distributio…

@arXiv_csLG_bot@mastoxiv.page
2025-10-03 11:00:21

Flatness-Aware Stochastic Gradient Langevin Dynamics
Stefano Bruno, Youngsik Hwang, Jaehyeon An, Sotirios Sabanis, Dong-Young Lim
https://arxiv.org/abs/2510.02174 https://

Flatness-Aware Stochastic Gradient Langevin Dynamics
Generalization in deep learning is closely tied to the pursuit of flat minima in the loss landscape, yet classical Stochastic Gradient Langevin Dynamics (SGLD) offers no mechanism to bias its dynamics toward such low-curvature solutions. This work introduces Flatness-Aware Stochastic Gradient Langevin Dynamics (fSGLD), designed to efficiently and provably seek flat minima in high-dimensional nonconvex optimization problems. At each iteration, fSGLD uses the stochastic gradient evaluated at para…

@arXiv_mathAP_bot@mastoxiv.page
2025-10-10 09:28:29

Gradient regularity for widely degenerate parabolic equations
Michael Strunk
https://arxiv.org/abs/2510.07999 https://arxiv.org/pdf/2510.07999

Gradient regularity for widely degenerate parabolic equations
In this paper, we are interested in the regularity of weak solutions $u\colonΩ_T\to\mathbb{R}$ to parabolic equations of the type \begin{equation*} \partial_t u - \mathrm{div} \nabla \mathcal{F}(x,t,Du) = f\qquad\mbox{in $Ω_T$}, \end{equation*} where $\mathcal{F}$ is only elliptic for values of $Du$ outside a bounded and convex set $E\subset \mathbb{R}^n$ with the property that $0\in \mathrm{Int}{E}$. Here, $Ω_T :=Ω\times(0,T)\subset\mathbb{R}^{n+1}$ denotes a space-time cylinder taken ov…

@arXiv_mathOC_bot@mastoxiv.page
2025-09-12 07:56:29

Convexity of Optimization Curves: Local Sharp Thresholds, Robustness Impossibility, and New Counterexamples
Le Duc Hieu
https://arxiv.org/abs/2509.08954 https://

Convexity of Optimization Curves: Local Sharp Thresholds, Robustness Impossibility, and New Counterexamples
We study when the \emph{optimization curve} of first--order methods -- the sequence \${f(x\_n)}*{n\ge0}\$ produced by constant--stepsize iterations -- is convex, equivalently when the forward differences \$f(x\_n)-f(x*{n+1})\$ are nonincreasing. For gradient descent (GD) on convex \$L\$--smooth functions, the curve is convex for all stepsizes \$η\le 1.75/L\$, and this threshold is tight. Moreover, gradient norms are nonincreasing for all \$η\le 2/L\$, and in continuous time (gradient flow) th…

@arXiv_csLG_bot@mastoxiv.page
2025-09-01 09:56:02

Convergence of Stochastic Gradient Methods for Wide Two-Layer Physics-Informed Neural Networks
Bangti Jin, Longjun Wu
https://arxiv.org/abs/2508.21571 https://

Convergence of Stochastic Gradient Methods for Wide Two-Layer Physics-Informed Neural Networks
Physics informed neural networks (PINNs) represent a very popular class of neural solvers for partial differential equations. In practice, one often employs stochastic gradient descent type algorithms to train the neural network. Therefore, the convergence guarantee of stochastic gradient descent is of fundamental importance. In this work, we establish the linear convergence of stochastic gradient descent / flow in training over-parameterized two layer PINNs for a general class of activation fu…

@arXiv_csLG_bot@mastoxiv.page
2025-10-06 10:25:09

AdaBet: Gradient-free Layer Selection for Efficient Training of Deep Neural Networks
Irene Tenison, Soumyajit Chatterjee, Fahim Kawsar, Mohammad Malekzadeh
https://arxiv.org/abs/2510.03101

AdaBet: Gradient-free Layer Selection for Efficient Training of Deep Neural Networks
To utilize pre-trained neural networks on edge and mobile devices, we often require efficient adaptation to user-specific runtime data distributions while operating under limited compute and memory resources. On-device retraining with a target dataset can facilitate such adaptations; however, it remains impractical due to the increasing depth of modern neural nets, as well as the computational overhead associated with gradient-based optimization across all layers. Current approaches reduce trai…

@arXiv_mathOC_bot@mastoxiv.page
2025-11-14 10:10:20

Global Solutions to Non-Convex Functional Constrained Problems with Hidden Convexity
Ilyas Fatkhullin, Niao He, Guanghui Lan, Florian Wolf
https://arxiv.org/abs/2511.10626 https://arxiv.org/pdf/2511.10626 https://arxiv.org/html/2511.10626
arXiv:2511.10626v1 Announce Type: new
Abstract: Constrained non-convex optimization is fundamentally challenging, as global solutions are generally intractable and constraint qualifications may not hold. However, in many applications, including safe policy optimization in control and reinforcement learning, such problems possess hidden convexity, meaning they can be reformulated as convex programs via a nonlinear invertible transformation. Typically such transformations are implicit or unknown, making the direct link with the convex program impossible. On the other hand, (sub-)gradients with respect to the original variables are often accessible or can be easily estimated, which motivates algorithms that operate directly in the original (non-convex) problem space using standard (sub-)gradient oracles. In this work, we develop the first algorithms to provably solve such non-convex problems to global minima. First, using a modified inexact proximal point method, we establish global last-iterate convergence guarantees with $\widetilde{\mathcal{O}}(\varepsilon^{-3})$ oracle complexity in non-smooth setting. For smooth problems, we propose a new bundle-level type method based on linearly constrained quadratic subproblems, improving the oracle complexity to $\widetilde{\mathcal{O}}(\varepsilon^{-1})$. Surprisingly, despite non-convexity, our methodology does not require any constraint qualifications, can handle hidden convex equality constraints, and achieves complexities matching those for solving unconstrained hidden convex optimization.
toXiv_bot_toot

@arXiv_mathOC_bot@mastoxiv.page
2025-11-14 09:44:20

On fundamental properties of high-order forward-backward envelope
Alireza Kabgani, Masoud Ahookhosh
https://arxiv.org/abs/2511.10421 https://arxiv.org/pdf/2511.10421 https://arxiv.org/html/2511.10421
arXiv:2511.10421v1 Announce Type: new
Abstract: This paper studies the fundamental properties of the high-order forward-backward splitting mapping (HiFBS) and its associated forward-backward envelope (HiFBE) through the lens of high-order regularization for nonconvex composite functions. Specifically, we (i) establish the boundedness and uniform boundedness of HiFBS, along with the H\"older and Lipschitz continuity of HiFBE; (ii) derive an explicit form for the subdifferentials of HiFBE; and (iii) investigate necessary and sufficient conditions for the differentiability and weak smoothness of HiFBE under suitable assumptions. By leveraging the prox-regularity of $g$ and the concept of $p$-calmness, we further demonstrate the local single-valuedness and continuity of HiFBS, which in turn guarantee the differentiability of HiFBE in neighborhoods of calm points. This paves the way for the development of gradient-based algorithms tailored to nonconvex composite optimization problems.
toXiv_bot_toot

@arXiv_mathOC_bot@mastoxiv.page
2025-11-14 09:41:00

Minimizing smooth Kurdyka-{\L}ojasiewicz functions via generalized descent methods: Convergence rate and complexity
Masoud Ahookhosh, Susan Ghaderi, Alireza Kabgani, Morteza Rahimi
https://arxiv.org/abs/2511.10414 https://arxiv.org/pdf/2511.10414 https://arxiv.org/html/2511.10414
arXiv:2511.10414v1 Announce Type: new
Abstract: This paper addresses the generalized descent algorithm (DEAL) for minimizing smooth functions, which is analyzed under the Kurdyka-{\L}ojasiewicz (KL) inequality. In particular, the suggested algorithm guarantees a sufficient decrease by adapting to the cost function's geometry. We leverage the KL property to establish the global convergence, convergence rates, and complexity. A particular focus is placed on the linear convergence of generalized descent methods. We show that the constant step-size and Armijo line search strategies along a generalized descent direction satisfy our generalized descent condition. Additionally, for nonsmooth functions by leveraging the smoothing techniques such as forward-backward and high-order Moreau envelopes, we show that the boosted proximal gradient method (BPGA) and the boosted high-order proximal-point (BPPA) methods are also specific cases of DEAL, respectively. It is notable that if the order of the high-order proximal term is chosen in a certain way (depending on the KL exponent), then the sequence generated by BPPA converges linearly for an arbitrary KL exponent. Our preliminary numerical experiments on inverse problems and LASSO demonstrate the efficiency of the proposed methods, validating our theoretical findings.
toXiv_bot_toot

@arXiv_csLG_bot@mastoxiv.page
2025-09-12 09:02:19

Value bounds and Convergence Analysis for Averages of LRP attributions
Alexander Binder, Nastaran Takmil-Homayouni, Urun Dogan
https://arxiv.org/abs/2509.08963 https://

Value bounds and Convergence Analysis for Averages of LRP attributions
We analyze numerical properties of Layer-wise relevance propagation (LRP)-type attribution methods by representing them as a product of modified gradient matrices. This representation creates an analogy to matrix multiplications of Jacobi-matrices which arise from the chain rule of differentiation. In order to shed light on the distribution of attribution values, we derive upper bounds for singular values. Furthermore we derive component-wise bounds for attribution map values. As a main result,…

@arXiv_csLG_bot@mastoxiv.page
2025-10-08 10:46:59

Inductive inference of gradient-boosted decision trees on graphs for insurance fraud detection
F\'elix Vandervorst, Bruno Deprez, Wouter Verbeke, Tim Verdonck
https://arxiv.org/abs/2510.05676

Inductive inference of gradient-boosted decision trees on graphs for insurance fraud detection
Graph-based methods are becoming increasingly popular in machine learning due to their ability to model complex data and relations. Insurance fraud is a prime use case, since false claims are often the result of organised criminals that stage accidents or the same persons filing erroneous claims on multiple policies. One challenge is that graph-based approaches struggle to find meaningful representations of the data because of the high class imbalance present in fraud data. Another is that insu…

@arXiv_csLG_bot@mastoxiv.page
2025-09-12 10:07:39

Balancing Utility and Privacy: Dynamically Private SGD with Random Projection
Zhanhong Jiang, Md Zahid Hasan, Nastaran Saadati, Aditya Balu, Chao Liu, Soumik Sarkar
https://arxiv.org/abs/2509.09485

Balancing Utility and Privacy: Dynamically Private SGD with Random Projection
Stochastic optimization is a pivotal enabler in modern machine learning, producing effective models for various tasks. However, several existing works have shown that model parameters and gradient information are susceptible to privacy leakage. Although Differentially Private SGD (DPSGD) addresses privacy concerns, its static noise mechanism impacts the error bounds for model performance. Additionally, with the exponential increase in model parameters, efficient learning of these models using s…

@arXiv_mathOC_bot@mastoxiv.page
2025-09-11 09:51:13

Linear Convergence of Gradient Descent for Quadratically Regularized Optimal Transport
Alberto Gonz\'alez-Sanz, Marcel Nutz, Andr\'es Riveros Valdevenito
https://arxiv.org/abs/2509.08547

Linear Convergence of Gradient Descent for Quadratically Regularized Optimal Transport
In optimal transport, quadratic regularization is an alternative to entropic regularization when sparse couplings or small regularization parameters are desired. Here quadratic regularization means that transport couplings are penalized by the squared $L^2$ norm, or equivalently the $χ^2$ divergence. While a number of computational approaches have been shown to work in practice, quadratic regularization is analytically less tractable than entropic, and we are not aware of a previous theoretica…

@arXiv_csLG_bot@mastoxiv.page
2025-09-12 12:47:22

Replaced article(s) found for cs.LG. https://arxiv.org/list/cs.LG/new
[2/5]:
- Convergence Analysis of Asynchronous Federated Learning with Gradient Compression for Non-Convex ...
Diying Yang, Yingwei Hou, Weigang Wu

@arXiv_mathOC_bot@mastoxiv.page
2025-10-09 09:24:51

Approximate Bregman proximal gradient algorithm with variable metric Armijo--Wolfe line search
Kiwamu Fujiki, Shota Takahashi, Akiko Takeda
https://arxiv.org/abs/2510.06615 http…

Approximate Bregman proximal gradient algorithm with variable metric Armijo--Wolfe line search
We propose a variant of the approximate Bregman proximal gradient (ABPG) algorithm for minimizing the sum of a smooth nonconvex function and a nonsmooth convex function. Although ABPG is known to converge globally to a stationary point even when the smooth part of the objective function lacks globally Lipschitz continuous gradients, and its iterates can often be expressed in closed form, ABPG relies on an Armijo line search to guarantee global convergence. Such reliance can slow down performanc…

@arXiv_mathOC_bot@mastoxiv.page
2025-10-13 08:52:00

Data-driven multifidelity and multiscale topology optimization based on phasor-based evolutionary de-homogenization
Shuzhi Xu, Yifan Guo, Hiroki Kawabe, Kentaro Yaji
https://arxiv.org/abs/2510.08830

Data-driven multifidelity and multiscale topology optimization based on phasor-based evolutionary de-homogenization
Multiscale topology optimization is crucial for designing porous infill structures with high stiffness-to-weight ratios and excellent energy absorption. Although gradient-based methods provide a rigorous framework, they are computationally expensive and struggle to capture cross-scale sensitivities in nonlinear settings. Moreover, the resulting hierarchical geometries are often overly complex and lack macroscopically meaningful features. To overcome these issues, we propose an evolutionary de-h…

@arXiv_csLG_bot@mastoxiv.page
2025-10-08 10:26:09

Correlating Cross-Iteration Noise for DP-SGD using Model Curvature
Xin Gu, Yingtai Xiao, Guanlin He, Jiamu Bai, Daniel Kifer, Kiwan Maeng
https://arxiv.org/abs/2510.05416 https:…

Correlating Cross-Iteration Noise for DP-SGD using Model Curvature
Differentially private stochastic gradient descent (DP-SGD) offers the promise of training deep learning models while mitigating many privacy risks. However, there is currently a large accuracy gap between DP-SGD and normal SGD training. This has resulted in different lines of research investigating orthogonal ways of improving privacy-preserving training. One such line of work, known as DP-MF, correlates the privacy noise across different iterations of stochastic gradient descent -- allowing l…

@arXiv_mathOC_bot@mastoxiv.page
2025-09-04 09:04:31

Stochastic versus Deterministic in Stochastic Gradient Descent
Runze Li, Jintao Xu, Wenxun Xing
https://arxiv.org/abs/2509.02912 https://arxiv.org/pdf/2509…

Stochastic versus Deterministic in Stochastic Gradient Descent
This paper considers the mini-batch stochastic gradient descent (SGD) for a structured minimization problem involving a finite-sum function with its gradient being stochastically approximated, and an independent term with its gradient being deterministically computed. We focus on the stochastic versus deterministic behavior of the mini-batch SGD for this setting. A convergence analysis is provided that captures the different roles of these two parts. Linear convergence of the algorithm to a nei…

@arXiv_mathOC_bot@mastoxiv.page
2025-10-06 09:27:49

Quantitative Convergence Analysis of Projected Stochastic Gradient Descent for Non-Convex Losses via the Goldstein Subdifferential
Yuping Zheng, Andrew Lamperski
https://arxiv.org/abs/2510.02735

Quantitative Convergence Analysis of Projected Stochastic Gradient Descent for Non-Convex Losses via the Goldstein Subdifferential
Stochastic gradient descent (SGD) is the main algorithm behind a large body of work in machine learning. In many cases, constraints are enforced via projections, leading to projected stochastic gradient algorithms. In recent years, a large body of work has examined the convergence properties of projected SGD for non-convex losses in asymptotic and non-asymptotic settings. Strong quantitative guarantees are available for convergence measured via Moreau envelopes. However, these results cannot be…

@arXiv_csLG_bot@mastoxiv.page
2025-10-03 11:04:31

Robust Tangent Space Estimation via Laplacian Eigenvector Gradient Orthogonalization
Dhruv Kohli, Sawyer J. Robertson, Gal Mishne, Alexander Cloninger
https://arxiv.org/abs/2510.02308

Robust Tangent Space Estimation via Laplacian Eigenvector Gradient Orthogonalization
Estimating the tangent spaces of a data manifold is a fundamental problem in data analysis. The standard approach, Local Principal Component Analysis (LPCA), struggles in high-noise settings due to a critical trade-off in choosing the neighborhood size. Selecting an optimal size requires prior knowledge of the geometric and noise characteristics of the data that are often unavailable. In this paper, we propose a spectral method, Laplacian Eigenvector Gradient Orthogonalization (LEGO), that util…

@arXiv_mathOC_bot@mastoxiv.page
2025-09-11 08:55:43

Finding a Multiple Follower Stackelberg Equilibrium: A Fully First-Order Method
April Niu, Kai Wang, Juba Ziani
https://arxiv.org/abs/2509.08161 https://ar…

Finding a Multiple Follower Stackelberg Equilibrium: A Fully First-Order Method
In this work, we propose the first fully first-order method to compute an epsilon stationary Stackelberg equilibrium with convergence guarantees. To achieve this, we first reframe the leader follower interaction as single level constrained optimization. Second, we define the Lagrangian and show that it can approximate the leaders gradient in response to the equilibrium reached by followers with only first-order gradient evaluations. These findings suggest a fully first order algorithm that alte…

@arXiv_mathOC_bot@mastoxiv.page
2025-09-05 07:58:11

Towards understanding Accelerated Stein Variational Gradient Flow -- Analysis of Generalized Bilinear Kernels for Gaussian target distributions
Viktor Stein, Wuchen Li
https://arxiv.org/abs/2509.04008 …

Towards understanding Accelerated Stein Variational Gradient Flow -- Analysis of Generalized Bilinear Kernels for Gaussian target distributions
Stein variational gradient descent (SVGD) is a kernel-based and non-parametric particle method for sampling from a target distribution, such as in Bayesian inference and other machine learning tasks. Different from other particle methods, SVGD does not require estimating the score, which is the gradient of the log-density. However, in practice, SVGD can be slow compared to score-estimation-based sampling algorithms. To design a fast and efficient high-dimensional sampling algorithm with the adv…

@arXiv_mathOC_bot@mastoxiv.page
2025-09-04 10:06:21

On the Perturbed Projection-Based Distributed Gradient-Descent Algorithm: A Fully-Distributed Adaptive Redesign
Tarek Bazizi, Mohamed Maghenem, Paolo Frasca, Antonio Lor\`ia, Elena Panteley
https://arxiv.org/abs/2509.03443

On the Perturbed Projection-Based Distributed Gradient-Descent Algorithm: A Fully-Distributed Adaptive Redesign
In this work, we revisit a classical distributed gradient-descent algorithm, introducing an interesting class of perturbed multi-agent systems. The state of each subsystem represents a local estimate of a solution to the global optimization problem. Thereby, the network is required to minimize local cost functions, while gathering the local estimates around a common value. Such a complex task suggests the interplay of consensus-based dynamics with gradient-descent dynamics. The latter descent d…

Tootfinder

Opt-in global Mastodon full text search. Join the index!