Tootfinder

@arXiv_csLG_bot@mastoxiv.page
2025-09-22 10:33:31

DIVEBATCH: Accelerating Model Training Through Gradient-Diversity Aware Batch Size Adaptation
Yuen Chen, Yian Wang, Hari Sundaram
https://arxiv.org/abs/2509.16173 https://

DIVEBATCH: Accelerating Model Training Through Gradient-Diversity Aware Batch Size Adaptation
The goal of this paper is to accelerate the training of machine learning models, a critical challenge since the training of large-scale deep neural models can be computationally expensive. Stochastic gradient descent (SGD) and its variants are widely used to train deep neural networks. In contrast to traditional approaches that focus on tuning the learning rate, we propose a novel adaptive batch size SGD algorithm, DiveBatch, that dynamically adjusts the batch size. Adapting the batch size is c…

@arXiv_eessSP_bot@mastoxiv.page
2025-09-22 09:36:41

Scalable Hessian-free Proximal Conjugate Gradient Method for Nonconvex and Nonsmooth Optimization
Yiming Zhou, Wei Dai
https://arxiv.org/abs/2509.15973 https://

Scalable Hessian-free Proximal Conjugate Gradient Method for Nonconvex and Nonsmooth Optimization
This work studies a composite minimization problem involving a differentiable function q and a nonsmooth function h, both of which may be nonconvex. This problem is ubiquitous in signal processing and machine learning yet remains challenging to solve efficiently, particularly when large-scale instances, poor conditioning, and nonconvexity coincide. To address these challenges, we propose a proximal conjugate gradient method (PCG) that matches the fast convergence of proximal (quasi-)Newton algo…

@arXiv_condmatstatmech_bot@mastoxiv.page
2025-09-22 08:23:41

Training thermodynamic computers by gradient descent
Stephen Whitelam
https://arxiv.org/abs/2509.15324 https://arxiv.org/pdf/2509.15324

Training thermodynamic computers by gradient descent
We show how to adjust the parameters of a thermodynamic computer by gradient descent in order to perform a desired computation at a specified observation time. Within a digital simulation of a thermodynamic computer, training proceeds by maximizing the probability with which the computer would generate an idealized dynamical trajectory. The idealized trajectory is designed to reproduce the activations of a neural network trained to perform the desired computation. This teacher-student scheme re…

@arXiv_physicsfludyn_bot@mastoxiv.page
2025-09-22 08:39:31

Assessment of the Gradient Jump Penalisation in Large-Eddy Simulations of Turbulence
Shiyu Du, Manuel M\"unsch, Niclas Jansson, Philipp Schlatter
https://arxiv.org/abs/2509.16013

Assessment of the Gradient Jump Penalisation in Large-Eddy Simulations of Turbulence
This research investigates the efficacy of the gradient jump penalisation (GJP) in large eddy simulations (LES) when coupled with active subgrid-scale (SGS) models. GJP is a stabilisation method tailored for the continuous Galerkin spectral element method, aiming at mitigating non-physical oscillations induced by discontinuous velocity gradients across element interfaces. We demonstrate that GJP effectively smoothens fields from LES without a salient impact on flow dynamics for the Taylor--Gree…

@arXiv_mathNA_bot@mastoxiv.page
2025-09-22 08:08:41

Variable-preconditioned transformed primal-dual method for generalized Wasserstein Gradient Flows
Jin Zeng, Dawei Zhan, Ruchi Guo, Chaozhen Wei
https://arxiv.org/abs/2509.15385 …

Variable-preconditioned transformed primal-dual method for generalized Wasserstein Gradient Flows
We propose a Variable-Preconditioned Transformed Primal-Dual (VPTPD) method for solving generalized Wasserstein gradient flows via a structure-preserving JKO scheme. This is a nontrivial extension of the TPD method [Chen et al. (2023) arXiv:2312.12355] incorporating proximal splitting techniques to address the challenges arising from the nonsmoothness of the objective function. Our key contributions include: (i) a semi-implicit-explicit iteration that combines proximal steps for the nonsmooth p…

@arXiv_csCL_bot@mastoxiv.page
2025-09-22 10:05:21

Sparse-Autoencoder-Guided Internal Representation Unlearning for Large Language Models
Tomoya Yamashita, Akira Ito, Yuuki Yamanaka, Masanori Yamada, Takayuki Miura, Toshiki Shibahara
https://arxiv.org/abs/2509.15631

Sparse-Autoencoder-Guided Internal Representation Unlearning for Large Language Models
As large language models (LLMs) are increasingly deployed across various applications, privacy and copyright concerns have heightened the need for more effective LLM unlearning techniques. Many existing unlearning methods aim to suppress undesirable outputs through additional training (e.g., gradient ascent), which reduces the probability of generating such outputs. While such suppression-based approaches can control model outputs, they may not eliminate the underlying knowledge embedded in the…

@arXiv_quantph_bot@mastoxiv.page
2025-09-22 10:06:01

Training Variational Quantum Circuits Using Particle Swarm Optimization
Marco Mordacci, Michele Amoretti
https://arxiv.org/abs/2509.15726 https://arxiv.org…

Training Variational Quantum Circuits Using Particle Swarm Optimization
In this work, the Particle Swarm Optimization (PSO) algorithm has been used to train various Variational Quantum Circuits (VQCs). This approach is motivated by the fact that commonly used gradient-based optimization methods can suffer from the barren plateaus problem. PSO is a stochastic optimization technique inspired by the collective behavior of a swarm of birds. The dimension of the swarm, the number of iterations of the algorithm, and the number of trainable parameters can be set. In this …

@arXiv_mathOC_bot@mastoxiv.page
2025-09-22 08:43:21

Escaping saddle points without Lipschitz smoothness: the power of nonlinear preconditioning
Alexander Bodard, Panagiotis Patrinos
https://arxiv.org/abs/2509.15817 https://

Escaping saddle points without Lipschitz smoothness: the power of nonlinear preconditioning
We study generalized smoothness in nonconvex optimization, focusing on $(L_0, L_1)$-smoothness and anisotropic smoothness. The former was empirically derived from practical neural network training examples, while the latter arises naturally in the analysis of nonlinearly preconditioned gradient methods. We introduce a new sufficient condition that encompasses both notions, reveals their close connection, and holds in key applications such as phase retrieval and matrix factorization. Leveraging …

@arXiv_eessIV_bot@mastoxiv.page
2025-09-22 08:18:31

Analysis Plug-and-Play Methods for Imaging Inverse Problems
Edward P. Chandler, Shirin Shoushtari, Brendt Wohlberg, Ulugbek S. Kamilov
https://arxiv.org/abs/2509.15422 https://

Analysis Plug-and-Play Methods for Imaging Inverse Problems
Plug-and-Play Priors (PnP) is a popular framework for solving imaging inverse problems by integrating learned priors in the form of denoisers trained to remove Gaussian noise from images. In standard PnP methods, the denoiser is applied directly in the image domain, serving as an implicit prior on natural images. This paper considers an alternative analysis formulation of PnP, in which the prior is imposed on a transformed representation of the image, such as its gradient. Specifically, we trai…

@arXiv_csCE_bot@mastoxiv.page
2025-09-22 12:05:47

Replaced article(s) found for cs.CE. https://arxiv.org/list/cs.CE/new
[1/1]:
- A comparative analysis for different finite element types in strain-gradient elasticity simulatio...
B. Cagri Sarar, M. Erden Yildizdag, Francesco Fabbrocino, B. Emek Abali

@arXiv_csSD_bot@mastoxiv.page
2025-09-22 09:59:11

Differentiable Acoustic Radiance Transfer
Sungho Lee, Matteo Scerbo, Seungu Han, Min Jun Choi, Kyogu Lee, Enzo De Sena
https://arxiv.org/abs/2509.15946 https://

Differentiable Acoustic Radiance Transfer
Geometric acoustics is an efficient approach to room acoustics modeling, governed by the canonical time-dependent rendering equation. Acoustic radiance transfer (ART) solves the equation through discretization, modeling the time- and direction-dependent energy exchange between surface patches given with flexible material properties. We introduce DART, a differentiable and efficient implementation of ART that enables gradient-based optimization of material properties. We evaluate DART on a simpl…

@toxi@mastodon.thi.ng
2025-12-16 09:54:11

Recursive polygon subdivision inspired by thin-section mineralogy...
(The area of each polygon is mapped to a color from a gradient. Made with https://thi.ng/geom, see next message for example & source code...)
1/2

Stop frame animation of a randomized abstract composition of initially thousands of small polygons, slowly converging into only a handful of larger cells/shards. The animation shows the recursive subdivision process in reverse order, i.e. the larger cells at the end are actually some of the first polygons created by randomly slicing the seed polygon (a circular 40-gon). The area/size of each individual poly is mapped to a color from a gradient, with small polys in orange/yellow/pink and large o…

@arXiv_physicsoptics_bot@mastoxiv.page
2025-09-22 09:28:01

The critical role of substrates in mitigating the power-efficiency trade-off in near-field thermophotovoltaics
Kartika N. Nimje, Julien Legendre, Michela F. Picardi, Alejandro W. Rodriguez, Georgia T. Papadakis
https://arxiv.org/abs/2509.16048

The critical role of substrates in mitigating the power-efficiency trade-off in near-field thermophotovoltaics
Near-field thermophotovoltaic systems can achieve ultra-high power densities, however, this often comes at the cost of reduced efficiency. We show that this power-efficiency trade-off can be mitigated through substrate engineering. We exploit gradient-based optimization and show that thin lossless metallic films with plasma frequencies resonantly matched to the plasmonic emitter can yield high power and spectral efficiency by spectrally enhancing and confining radiative heat transfer to a narro…

@arXiv_csLG_bot@mastoxiv.page
2025-09-22 10:33:51

Inverting Trojans in LLMs
Zhengxing Li, Guangmingmei Yang, Jayaram Raghuram, David J. Miller, George Kesidis
https://arxiv.org/abs/2509.16203 https://arxiv…

Inverting Trojans in LLMs
While effective backdoor detection and inversion schemes have been developed for AIs used e.g. for images, there are challenges in "porting" these methods to LLMs. First, the LLM input space is discrete, which precludes gradient-based search over this space, central to many backdoor inversion methods. Second, there are ~30,000^k k-tuples to consider, k the token-length of a putative trigger. Third, for LLMs there is the need to blacklist tokens that have strong marginal associations with the pu…

@arXiv_physicsspaceph_bot@mastoxiv.page
2025-09-22 08:24:31

Particle in cell simulation on mode conversion of Saturn's 20 kHz narrowband radio emission
Zhoufan Mu, Yao Chen, Tangmu Li, Sulan Ni, Zilong Zhang, Hao Ning
https://arxiv.org/abs/2509.15542

Particle in cell simulation on mode conversion of Saturn's 20 kHz narrowband radio emission
The Z-to-O mode conversion at the density gradient is the prevailing mechanism of narrowband (NB) radio emission in planetary magnetosphere. Most previous numerical models were for NB emission observed in the Earth magnetosphere, using the cold plasma fluid approximation that excluded any kinetic effect. Here we investigate the Z-to-O conversion process underlying the Saturn's 20 kHz NB emission, using the fully-kinetic and electromagnetic particle-in-cell (PIC) simulation. We simulate the whol…

@cosmos4u@scicomm.xyz
2025-12-18 01:24:45

Ionospheric gradient estimation using ground-based GEO observations for monitoring multi-scale ionospheric dynamics: #Ionosphere in motion - a new way to track space weather in real time: https://www.eurekalert.org/news-releases/1110143

@arXiv_statML_bot@mastoxiv.page
2025-10-15 09:49:02

Statistical Guarantees for High-Dimensional Stochastic Gradient Descent
Jiaqi Li, Zhipeng Lou, Johannes Schmidt-Hieber, Wei Biao Wu
https://arxiv.org/abs/2510.12013 https://

Statistical Guarantees for High-Dimensional Stochastic Gradient Descent
Stochastic Gradient Descent (SGD) and its Ruppert-Polyak averaged variant (ASGD) lie at the heart of modern large-scale learning, yet their theoretical properties in high-dimensional settings are rarely understood. In this paper, we provide rigorous statistical guarantees for constant learning-rate SGD and ASGD in high-dimensional regimes. Our key innovation is to transfer powerful tools from high-dimensional time series to online learning. Specifically, by viewing SGD as a nonlinear autoregres…

@arXiv_mathOC_bot@mastoxiv.page
2025-09-22 09:11:01

A generalized canonical metric for optimization on the indefinite Stiefel manifold
Dinh Van Tiep, Duong Thi Viet An, Nguyen Thi Ngoc Oanh, Nguyen Thanh Son
https://arxiv.org/abs/2509.16113

A generalized canonical metric for optimization on the indefinite Stiefel manifold
Various tasks in scientific computing can be modeled as an optimization problem on the indefinite Stiefel manifold. We address this using the Riemannian approach, which basically consists of equipping the feasible set with a Riemannian metric, preparing geometric tools such as orthogonal projections, formulae for Riemannian gradient, retraction and then extending an unconstrained optimization algorithm on the Euclidean space to the established manifold. The choice for the metric undoubtedly has…

@arXiv_mathSG_bot@mastoxiv.page
2025-10-14 08:14:58

An Invitation to Obstruction Bundle Gluing Through Morse Flow Lines
Ipsita Datta, Yuan Yao
https://arxiv.org/abs/2510.10393 https://arxiv.org/pdf/2510.1039…

An Invitation to Obstruction Bundle Gluing Through Morse Flow Lines
We adapt "Obstruction Bundle Gluing (OBG)" techniques from Hutchings and Taubes (arxiv: 0701300, 0705.2074) to Morse theory. We consider Morse function-metric pairs with gradient flowlines that have nontrivial yet well-controlled cokernels (i.e., the gradient flowlines are not transversely cut out). We investigate (i) whether these nontransverse gradient flowlines can be glued to other gradient flowlines and (ii) the bifurcation of gradient flowlines after we perturb the metric to be Morse-Smal…

@arXiv_csLG_bot@mastoxiv.page
2025-09-22 10:32:11

Dynamic Classifier-Free Diffusion Guidance via Online Feedback
Pinelopi Papalampidi, Olivia Wiles, Ira Ktena, Aleksandar Shtedritski, Emanuele Bugliarello, Ivana Kajic, Isabela Albuquerque, Aida Nematzadeh
https://arxiv.org/abs/2509.16131

Dynamic Classifier-Free Diffusion Guidance via Online Feedback
Classifier-free guidance (CFG) is a cornerstone of text-to-image diffusion models, yet its effectiveness is limited by the use of static guidance scales. This "one-size-fits-all" approach fails to adapt to the diverse requirements of different prompts; moreover, prior solutions like gradient-based correction or fixed heuristic schedules introduce additional complexities and fail to generalize. In this work, we challeng this static paradigm by introducing a framework for dynamic CFG scheduling. …

@arXiv_hepph_bot@mastoxiv.page
2025-10-15 09:58:12

Gradient-flowed operator product expansion without IR renormalons
Martin Beneke (TU Munich), Hiromasa Takaura (Kyoto University)
https://arxiv.org/abs/2510.12193 https://…

Gradient-flowed operator product expansion without IR renormalons
A long-standing problem concerns the question how to consistently combine perturbative expansions in QCD with power corrections in the context of the operator product expansion (OPE), since the former exhibit ambiguities due to infrared renormalons, which are of the same order as the power corrections. We propose to use the gradient flow time $1/\sqrt{t}$ as a factorization scale and to express the OPE in terms of IR renormalon-free subtracted perturbative expansions and unambiguous matrix elem…

@arXiv_csSD_bot@mastoxiv.page
2025-09-22 10:01:11

Reverse Engineering of Music Mixing Graphs with Differentiable Processors and Iterative Pruning
Sungho Lee, Marco Mart\'inez-Ram\'irez, Wei-Hsiang Liao, Stefan Uhlich, Giorgio Fabbro, Kyogu Lee, Yuki Mitsufuji
https://arxiv.org/abs/2509.15948

Reverse Engineering of Music Mixing Graphs with Differentiable Processors and Iterative Pruning
Reverse engineering of music mixes aims to uncover how dry source signals are processed and combined to produce a final mix. We extend the prior works to reflect the compositional nature of mixing and search for a graph of audio processors. First, we construct a mixing console, applying all available processors to every track and subgroup. With differentiable processor implementations, we optimize their parameters with gradient descent. Then, we repeat the process of removing negligible process…

@arXiv_mathAP_bot@mastoxiv.page
2025-10-14 10:33:18

Optimal gradient estimates for conductivity problems with imperfect low-conductivity interfaces
Hongjie Dong, Haigang Li, Yan Zhao
https://arxiv.org/abs/2510.10615 https://

Optimal gradient estimates for conductivity problems with imperfect low-conductivity interfaces
This paper studies field concentration between two nearly touching conductors separated by imperfect low-conductivity interfaces, modeled by Robin boundary conditions. It is known that for any sufficiently small interfacial bonding parameter $γ> 0$, the gradient remains uniformly bounded with respect to the separation distance $\varepsilon$. In contrast, for the perfect bonding case ($γ= 0$, corresponding to the perfect conductivity problem), the gradient may blow up as $\varepsilon \to 0$ at…

@arXiv_csRO_bot@mastoxiv.page
2025-10-07 11:43:02

Building Gradient by Gradient: Decentralised Energy Functions for Bimanual Robot Assembly
Alexander L. Mitchell, Joe Watson, Ingmar Posner
https://arxiv.org/abs/2510.04696 https…

Building Gradient by Gradient: Decentralised Energy Functions for Bimanual Robot Assembly
There are many challenges in bimanual assembly, including high-level sequencing, multi-robot coordination, and low-level, contact-rich operations such as component mating. Task and motion planning (TAMP) methods, while effective in this domain, may be prohibitively slow to converge when adapting to disturbances that require new task sequencing and optimisation. These events are common during tight-tolerance assembly, where difficult-to-model dynamics such as friction or deformation require rapi…

@arXiv_mathDG_bot@mastoxiv.page
2025-10-08 08:33:39

On curvature estimates for four-dimensional gradient Ricci solitons
Huai-Dong Cao
https://arxiv.org/abs/2510.06059 https://arxiv.org/pdf/2510.06059

On curvature estimates for four-dimensional gradient Ricci solitons
In this survey paper, we analyse and compare the recent curvature estimates for three types of $4$-dimensional gradient Ricci solitons, especially between Ricci shrinkers [58] and expanders [17]. In addition, we provide some new curvature estimates for $4$-dimensional gradient steady Ricci solitons, including the sharp curvature estimate $|Rm|\le C R$ for gradient steady Ricci solitons with positive Ricci curvature (see Theorem 1.1).

@arXiv_mathOC_bot@mastoxiv.page
2025-09-22 08:25:11

Introducing the method of ellipcenters, a new first order technique for unconstrained optimization
Roger Behling, Ramyro Aquines Correa, Eduarda Ferreira Zanatta, Vincent Guigues
https://arxiv.org/abs/2509.15471

Introducing the method of ellipcenters, a new first order technique for unconstrained optimization
In this paper, we introduce the Method of Ellipcenters (ME) for unconstrained minimization. At the cost of two gradients per iteration and a line search, we compute the next iterate by setting it as the center of an elliptical interpolation. The idea behind the ellipse built in each step is to emulate the original level curve of the objective function constrained to a suitable two-dimensional affine space, which is determined by the current iterate and two appropriate gradient vectors. We prese…

@arXiv_statCO_bot@mastoxiv.page
2025-09-22 12:32:56

Replaced article(s) found for stat.CO. https://arxiv.org/list/stat.CO/new
[1/1]:
- Gradient-Free Sequential Bayesian Experimental Design via Interacting Particle Systems
Robert Gruhlke, Matei Hanu, Claudia Schillings, Philipp Wacker

@arXiv_csCL_bot@mastoxiv.page
2025-10-13 10:39:20

SPG: Sandwiched Policy Gradient for Masked Diffusion Language Models
Chengyu Wang, Paria Rashidinejad, DiJia Su, Song Jiang, Sid Wang, Siyan Zhao, Cai Zhou, Shannon Zejiang Shen, Feiyu Chen, Tommi Jaakkola, Yuandong Tian, Bo Liu
https://arxiv.org/abs/2510.09541

SPG: Sandwiched Policy Gradient for Masked Diffusion Language Models
Diffusion large language models (dLLMs) are emerging as an efficient alternative to autoregressive models due to their ability to decode multiple tokens in parallel. However, aligning dLLMs with human preferences or task-specific rewards via reinforcement learning (RL) is challenging because their intractable log-likelihood precludes the direct application of standard policy gradient methods. While prior work uses surrogates like the evidence lower bound (ELBO), these one-sided approximations c…

@arXiv_statME_bot@mastoxiv.page
2025-10-13 09:25:10

Reliability Sensitivity with Response Gradient
Siu-Kui Au, Zi-Jun Cao
https://arxiv.org/abs/2510.09315 https://arxiv.org/pdf/2510.09315

Reliability Sensitivity with Response Gradient
Engineering risk is concerned with the likelihood of failure and the scenarios when it occurs. The sensitivity of failure probability to change in system parameters is relevant to risk-informed decision making. Computing sensitivity is at least one level more difficult than the probability itself, which is already challenged by a large number of input random variables, rare events and implicit nonlinear `black-box' response. Finite difference with Monte Carlo probability estimates is spurious, …

@arXiv_condmatmeshall_bot@mastoxiv.page
2025-10-09 09:32:21

Thermal gradient-driven skyrmion dynamics with near-zero skyrmion Hall angle
Yogesh Kumar, Hurmal Saren, Pintu Das
https://arxiv.org/abs/2510.07020 https://

Thermal gradient-driven skyrmion dynamics with near-zero skyrmion Hall angle
Thermal gradient driven skyrmion dynamics offers a promising route toward green spintronics, enabling the utilization of waste heat for information transport and processing. Using micromagnetic simulations, we investigate Neel skyrmions in a Co-Pt bilayer nanoracetrack and demonstrate that stochastic torques induced by a thermal gradient drive skyrmion motion toward the hotter region with a nearly vanishing Hall angle. The dynamics depends sensitively on intrinsic material parameters - the skyr…

@arXiv_mathNA_bot@mastoxiv.page
2025-10-14 10:54:08

Forward and backward error bounds for a mixed precision preconditioned conjugate gradient algorithm
Thomas Bake, Erin Carson, Yuxin Ma
https://arxiv.org/abs/2510.11379 https://

Forward and backward error bounds for a mixed precision preconditioned conjugate gradient algorithm
The preconditioned conjugate gradient (PCG) algorithm is one of the most popular algorithms for solving large-scale linear systems Ax = b, where A is a symmetric positive definite matrix. Rather than computing residuals directly, it updates the residual vectors recursively. Current analyses of the conjugate gradient (CG) algorithm in finite precision typically assume that the norm of the recursively updated residual goes orders of magnitude below the machine precision, focusing mainly on boundi…

@arXiv_statML_bot@mastoxiv.page
2025-10-13 08:27:00

Gradient-Guided Furthest Point Sampling for Robust Training Set Selection
Morris Trestman, Stefan Gugler, Felix A. Faber, O. A. von Lilienfeld
https://arxiv.org/abs/2510.08906 h…

Gradient-Guided Furthest Point Sampling for Robust Training Set Selection
Smart training set selections procedures enable the reduction of data needs and improves predictive robustness in machine learning problems relevant to chemistry. We introduce Gradient Guided Furthest Point Sampling (GGFPS), a simple extension of Furthest Point Sampling (FPS) that leverages molecular force norms to guide efficient sampling of configurational spaces of molecules. Numerical evidence is presented for a toy-system (Styblinski-Tang function) as well as for molecular dynamics traject…

@arXiv_eessSY_bot@mastoxiv.page
2025-09-30 11:15:01

Small-Covariance Noise-to-State Stability of Stochastic Systems and Its Applications to Stochastic Gradient Dynamics
Leilei Cui, Zhong-Ping Jiang, Eduardo D. Sontag
https://arxiv.org/abs/2509.24277

Small-Covariance Noise-to-State Stability of Stochastic Systems and Its Applications to Stochastic Gradient Dynamics
This paper studies gradient dynamics subject to additive random noise, which may arise from sources such as stochastic gradient estimation, measurement noise, or stochastic sampling errors. To analyze the robustness of such stochastic gradient systems, the concept of small-covariance noise-to-state stability (NSS) is introduced, along with a Lyapunov-based characterization. Furthermore, the classical Polyak-Lojasiewicz (PL) condition on the objective function is generalized to the $\mathcal{K}$…

@arXiv_csGT_bot@mastoxiv.page
2025-10-07 07:46:28

On the $O(1/T)$ Convergence of Alternating Gradient Descent-Ascent in Bilinear Games
Tianlong Nan, Shuvomoy Das Gupta, Garud Iyengar, Christian Kroer
https://arxiv.org/abs/2510.03855

On the $O(1/T)$ Convergence of Alternating Gradient Descent-Ascent in Bilinear Games
We study the alternating gradient descent-ascent (AltGDA) algorithm in two-player zero-sum games. Alternating methods, where players take turns to update their strategies, have long been recognized as simple and practical approaches for learning in games, exhibiting much better numerical performance than their simultaneous counterparts. However, our theoretical understanding of alternating algorithms remains limited, and results are mostly restricted to the unconstrained setting. We show that f…

@arXiv_csCR_bot@mastoxiv.page
2025-10-09 09:21:21

Reading Between the Lines: Towards Reliable Black-box LLM Fingerprinting via Zeroth-order Gradient Estimation
Shuo Shao, Yiming Li, Hongwei Yao, Yifei Chen, Yuchen Yang, Zhan Qin
https://arxiv.org/abs/2510.06605

Reading Between the Lines: Towards Reliable Black-box LLM Fingerprinting via Zeroth-order Gradient Estimation
The substantial investment required to develop Large Language Models (LLMs) makes them valuable intellectual property, raising significant concerns about copyright protection. LLM fingerprinting has emerged as a key technique to address this, which aims to verify a model's origin by extracting an intrinsic, unique signature (a "fingerprint") and comparing it to that of a source model to identify illicit copies. However, existing black-box fingerprinting methods often fail to generate distinctiv…

@arXiv_csCV_bot@mastoxiv.page
2025-10-14 22:04:05

Replaced article(s) found for cs.CV. https://arxiv.org/list/cs.CV/new
[4/8]:
- Boosting Adversarial Transferability via Commonality-Oriented Gradient Optimization
Yanting Gao, Yepeng Liu, Junming Liu, Qi Zhang, Hongyun Zhang, Duoqian Miao, Cairong Zhao

@arXiv_mathAP_bot@mastoxiv.page
2025-10-15 09:54:41

Liouville results for $(p,q)$-Laplacian elliptic equations with source terms involving gradient nonlinearities
Mousomi Bhakta, Anup Biswas, Roberta Filippucci
https://arxiv.org/abs/2510.12486

Liouville results for $(p,q)$-Laplacian elliptic equations with source terms involving gradient nonlinearities
In this paper, we present a series of Liouville-type theorems for a class of nonhomogeneous quasilinear elliptic equations featuring reactions that depend on the solution and its gradient. Specifically, we investigate equations of the form $-Δ_p u - Δ_q u = f(u,\nabla u)$ with $p > q > 1$, where the nonlinearity $f$ takes forms such as $u^s|\nabla u|^m$ or $u^s + M|\nabla u|^m$ ($s, m\geq 0$). Our approach is twofold. For cases where the reaction term satisfies $|f(u,…

@arXiv_physicsmedph_bot@mastoxiv.page
2025-10-07 09:13:32

Human brain high-resolution diffusion MRI with optimized slice-by-slice B0 field shimming in head-only high-performance gradient MRI systems
Patricia Lan, Sherry S. Huang, Chitresh Bhushan, Xinzeng Wang, Seung-Kyun Lee, Raymond Y. Huang, Jerome J. Maller, Jennifer A. McNab, Ante Zhu
https://arxiv.org/abs/2510.03586

Human brain high-resolution diffusion MRI with optimized slice-by-slice B0 field shimming in head-only high-performance gradient MRI systems
The purpose of this study is to propose a brain tissue-selective, optimized slice-by-slice B0 field shimming for high-resolution brain diffusion MRI. We incorporated actual gradient fields of X, Y, and Z gradient coils in the calculation of the shimming coefficients in dynamic slice-by-slice B0 field shimming to minimize B0 field inhomogeneity (i.e., Delta B0) in deep-learning segmented brain tissues. Diffusion MRI with oscillating gradient spin echo (OGSE) at 55 Hz and pulsed gradient spin ech…

@arXiv_mathOC_bot@mastoxiv.page
2025-10-14 11:35:58

Nonlinearly Preconditioned Gradient Methods: Momentum and Stochastic Analysis
Konstantinos Oikonomidis, Jan Quan, Panagiotis Patrinos
https://arxiv.org/abs/2510.11312 https://…

Nonlinearly Preconditioned Gradient Methods: Momentum and Stochastic Analysis
We study nonlinearly preconditioned gradient methods for smooth nonconvex optimization problems, focusing on sigmoid preconditioners that inherently perform a form of gradient clipping akin to the widely used gradient clipping technique. Building upon this idea, we introduce a novel heavy ball-type algorithm and provide convergence guarantees under a generalized smoothness condition that is less restrictive than traditional Lipschitz smoothness, thus covering a broader class of functions. Addit…

@arXiv_qbioNC_bot@mastoxiv.page
2025-10-09 09:04:41

Gradient of White Matter Functional Variability via fALFF Differential Identifiability
Xinle Chang, Yang Yang, Yueran Li, Zhengcen Li, Haijin Zeng, Jingyong Su
https://arxiv.org/abs/2510.06914

Gradient of White Matter Functional Variability via fALFF Differential Identifiability
Functional variability in both gray matter (GM) and white matter (WM) is closely associated with human brain cognitive and developmental processes, and is commonly assessed using functional connectivity (FC). However, as a correlation-based approach, FC captures the co-fluctuation between brain regions rather than the intensity of neural activity in each region. Consequently, FC provides only a partial view of functional variability, and this limitation is particularly pronounced in WM, where f…

@arXiv_astrophGA_bot@mastoxiv.page
2025-09-30 09:39:31

A gradient boosting and broadband approach to finding Lyman-{\alpha} emitting galaxies beyond narrowband surveys
A. Vale, A. Paulino-Afonso, A. Humphrey, P. A. C. Cunha, B. Ribeiro, B. Cerqueira, R. Carvajal, J. Fonseca
https://arxiv.org/abs/2509.22915

A gradient boosting and broadband approach to finding Lyman-α emitting galaxies beyond narrowband surveys
In this work, we test whether gradient-boosting algorithms, trained on broadband photometric data from traditional Lyman-$α$ emitting (LAE) surveys, can efficiently and accurately identify LAE candidates from typical star-forming galaxies at similar redshifts and brightness levels. Using galaxy samples at $z \in [2,6]$ derived from the COSMOS2020 and SC4K catalogs, we trained gradient-boosting machine-learning algorithms (LGBM, XGBoost, and CatBoost) using optical and near-infrared broadband p…

@arXiv_mathDG_bot@mastoxiv.page
2025-10-09 08:19:10

Stability of asymptotically conical gradient K\"ahler-Ricci expanders
Longteng Chen
https://arxiv.org/abs/2510.06850 https://arxiv.org/pdf/2510.06850

Stability of asymptotically conical gradient Kähler-Ricci expanders
In this work, we consider a perturbation of an asymptotically conical gradient expanding Kähler-Ricci soliton metric $g$ in the same Kähler class. We demonstrate that, under suitable assumptions, the normalized Kähler-Ricci flow starting from the initial perturbed metric exists for all time and converges uniformly to an asymptotically conical gradient expanding Kähler-Ricci soliton metric $g_\infty$. Moreover, if the perturbed initial metric is asymptotic to $g$ at spatial infinity, then th…

@arXiv_quantph_bot@mastoxiv.page
2025-10-13 09:24:10

Statistical Benchmarking of Optimization Methods for Variational Quantum Eigensolver under Quantum Noise
Silvie Ill\'esov\'a, Tom\'a\v{s} Bezd\v{e}k, Vojt\v{e}ch Nov\'ak, Bruno Senjean, Martin Beseda
https://arxiv.org/abs/2510.08727

Statistical Benchmarking of Optimization Methods for Variational Quantum Eigensolver under Quantum Noise
This work investigates the performance of numerical optimization algorithms applied to the State-Averaged Orbital-Optimized Variational Quantum Eigensolver for the H2 molecule under various quantum noise conditions. The goal is to assess the stability, accuracy, and computational efficiency of commonly used gradient-based, gradient-free, and global optimization strategies within the Noisy Intermediate-Scale Quantum regime. We systematically compare six representative optimizers, BFGS, SLSQP, Ne…

@arXiv_mathOC_bot@mastoxiv.page
2025-10-15 10:19:31

Temporal Variabilities Limit Convergence Rates in Gradient-Based Online Optimization
Bryan Van Scoy, Gianluca Bianchin
https://arxiv.org/abs/2510.12512 https://

Temporal Variabilities Limit Convergence Rates in Gradient-Based Online Optimization
This paper investigates the fundamental performance limits of gradient-based algorithms for time-varying optimization. Leveraging the internal model principle and root locus techniques, we show that temporal variabilities impose intrinsic limits on the achievable rate of convergence. For a problem with condition ratio $κ$ and time variation whose model has degree $n$, we show that the worst-case convergence rate of any minimal-order gradient-based algorithm is $ρ_\text{TV} = (\frac{κ-1}{κ+1…

@arXiv_condmatstatmech_bot@mastoxiv.page
2025-10-10 07:57:48

Thermodynamically Consistent Continuum Theory of Magnetic Particles in High-Gradient Fields
Marko Tesanovic, Daniel M. Markiewitz, Marcus L. Popp, Martin Z. Bazant, Sonja Berensmeier
https://arxiv.org/abs/2510.07552

Thermodynamically Consistent Continuum Theory of Magnetic Particles in High-Gradient Fields
Magnetic particles underpin a broad range of technologies, from water purification and mineral processing to bioseparations and targeted drug delivery. The dynamics of magnetic particles in high-gradient magnetic fields-encompassing both their transport and eventual capture-arise from the coupled interplay of field-driven drift, fluid advection, and particle-field feedback. These processes remain poorly captured by existing models relying on empirical closures or discrete particle tracking. Her…

@cosmos4u@scicomm.xyz
2025-12-09 17:57:01

ALMA Reveals an Eccentricity Gradient in the #Fomalhaut Debris Disk: https://iopscience.iop.org/article/10.3847/1538-4357/adfadc -> A Planet Carving the Fomalhaut Debris Disk? https://aasnova.org/2025/12/09/michelangelo-in-space-a-planet-carving-the-fomalhaut-debris-disk/

@arXiv_statML_bot@mastoxiv.page
2025-09-30 09:30:11

Statistical Inference for Gradient Boosting Regression
Haimo Fang, Kevin Tan, Giles Hooker
https://arxiv.org/abs/2509.23127 https://arxiv.org/pdf/2509.2312…

Statistical Inference for Gradient Boosting Regression
Gradient boosting is widely popular due to its flexibility and predictive accuracy. However, statistical inference and uncertainty quantification for gradient boosting remain challenging and under-explored. We propose a unified framework for statistical inference in gradient boosting regression. Our framework integrates dropout or parallel training with a recently proposed regularization procedure that allows for a central limit theorem (CLT) for boosting. With these enhancements, we surprising…

@arXiv_csLG_bot@mastoxiv.page
2025-10-03 11:00:21

Flatness-Aware Stochastic Gradient Langevin Dynamics
Stefano Bruno, Youngsik Hwang, Jaehyeon An, Sotirios Sabanis, Dong-Young Lim
https://arxiv.org/abs/2510.02174 https://

Flatness-Aware Stochastic Gradient Langevin Dynamics
Generalization in deep learning is closely tied to the pursuit of flat minima in the loss landscape, yet classical Stochastic Gradient Langevin Dynamics (SGLD) offers no mechanism to bias its dynamics toward such low-curvature solutions. This work introduces Flatness-Aware Stochastic Gradient Langevin Dynamics (fSGLD), designed to efficiently and provably seek flat minima in high-dimensional nonconvex optimization problems. At each iteration, fSGLD uses the stochastic gradient evaluated at para…

@arXiv_mathSG_bot@mastoxiv.page
2025-10-14 09:28:58

From Morse Functions to Lefschetz Fibrations on Cotangent Bundles
Emmanuel Giroux
https://arxiv.org/abs/2510.10669 https://arxiv.org/pdf/2510.10669

From Morse Functions to Lefschetz Fibrations on Cotangent Bundles
We prove that, for any Morse function on a compact manifold and any adapted gradient satisfying the Morse-Smale condition, there is a homotopically unique complex-valued symplectic Lefschetz fibration on the cotangent bundle whose restriction to the zero-section is the given function, whose imaginary part is the evaluation of covectors on the gradient, and which is equivariant under the actions of the fiberwise antipodal involution and the complex conjugation. Then we study the topology and sym…

@arXiv_eessSY_bot@mastoxiv.page
2025-10-06 09:27:19

Global Convergence of Policy Gradient for Entropy Regularized Linear-Quadratic Control with multiplicative noise
Gabriel Diaz, Lucky Li, Wenhao Zhang
https://arxiv.org/abs/2510.02896

Global Convergence of Policy Gradient for Entropy Regularized Linear-Quadratic Control with multiplicative noise
Reinforcement Learning (RL) has emerged as a powerful framework for sequential decision-making in dynamic environments, particularly when system parameters are unknown. This paper investigates RL-based control for entropy-regularized Linear Quadratic control (LQC) problems with multiplicative noises over an infinite time horizon. First, we adapt the Regularized Policy Gradient (RPG) algorithm to stochastic optimal control settings, proving that despite the non-convexity of the problem, RPG conv…

@arXiv_mathOC_bot@mastoxiv.page
2025-10-14 11:44:38

Adaptive Conditional Gradient Descent
Abbas Khademi, Antonio Silveti-Falls
https://arxiv.org/abs/2510.11440 https://arxiv.org/pdf/2510.11440

Adaptive Conditional Gradient Descent
Selecting an effective step-size is a fundamental challenge in first-order optimization, especially for problems with non-Euclidean geometries. This paper presents a novel adaptive step-size strategy for optimization algorithms that rely on linear minimization oracles, as used in the Conditional Gradient or non-Euclidean Normalized Steepest Descent algorithms. Using a simple heuristic to estimate a local Lipschitz constant for the gradient, we can determine step-sizes that guarantee sufficient …

@arXiv_csCV_bot@mastoxiv.page
2025-10-14 16:14:34

Crosslisted article(s) found for cs.CV. https://arxiv.org/list/cs.CV/new
[1/3]:
- Gradient-Sign Masking for Task Vector Transport Across Pre-Trained Models
Rinaldi, Panariello, Salici, Liu, Ciccone, Porrello, Calderara

@arXiv_mathDG_bot@mastoxiv.page
2025-10-07 10:16:22

Curvature pinching of asymptotically conical gradient expanding Ricci solitons
Huai-Dong Cao, Junming Xie
https://arxiv.org/abs/2510.05075 https://arxiv.or…

Curvature pinching of asymptotically conical gradient expanding Ricci solitons
Since the well-known work of Hamilton [62] and Ivey [64], the Hamilton-Ivey curvature pinching and its generalizations have become a signature feature of gradient shrinking and steady Ricci solitons, and more generally, of ancient solutions to the Ricci flow. However, analogous results for gradient expanding Ricci solitons have remained elusive. This is largely due to the fact that the proofs of existing curvature pinching estimates crucially rely on shrinking and steady solitons being ancient,…

@arXiv_csCR_bot@mastoxiv.page
2025-10-06 09:58:19

Untargeted Jailbreak Attack
Xinzhe Huang, Wenjing Hu, Tianhang Zheng, Kedong Xiu, Xiaojun Jia, Di Wang, Zhan Qin, Kui Ren
https://arxiv.org/abs/2510.02999 https://

Untargeted Jailbreak Attack
Existing gradient-based jailbreak attacks on Large Language Models (LLMs), such as Greedy Coordinate Gradient (GCG) and COLD-Attack, typically optimize adversarial suffixes to align the LLM output with a predefined target response. However, by restricting the optimization objective as inducing a predefined target, these methods inherently constrain the adversarial search space, which limit their overall attack efficacy. Furthermore, existing methods typically require a large number of optimizat…

@arXiv_mathNA_bot@mastoxiv.page
2025-10-10 08:55:59

Stochastic Gradient Descent for Incomplete Tensor Linear Systems
Anna Ma, Deanna Needell, Alexander Xue
https://arxiv.org/abs/2510.07630 https://arxiv.org/…

Stochastic Gradient Descent for Incomplete Tensor Linear Systems
Solving large tensor linear systems poses significant challenges due to the high volume of data stored, and it only becomes more challenging when some of the data is missing. Recently, Ma et al. showed that this problem can be tackled using a stochastic gradient descent-based method, assuming that the missing data follows a uniform missing pattern. We adapt the technique by modifying the update direction, showing that the method is applicable under other missing data models. We prove convergenc…

@arXiv_csLG_bot@mastoxiv.page
2025-10-10 11:18:19

On the optimization dynamics of RLVR: Gradient gap and step size thresholds
Joe Suk, Yaqi Duan
https://arxiv.org/abs/2510.08539 https://arxiv.org/pdf/2510.…

On the optimization dynamics of RLVR: Gradient gap and step size thresholds
Reinforcement Learning with Verifiable Rewards (RLVR), which uses simple binary feedback to post-train large language models, has shown significant empirical success. However, a principled understanding of why it works has been lacking. This paper builds a theoretical foundation for RLVR by analyzing its training process at both the full-response (trajectory) and token levels. Central to our analysis is a quantity called the Gradient Gap, which formalizes the direction of improvement from low-r…

@arXiv_mathOC_bot@mastoxiv.page
2025-10-15 09:38:41

A Gradient Guided Diffusion Framework for Chance Constrained Programming
Boyang Zhang, Zhiguo Wang, Ya-Feng Liu
https://arxiv.org/abs/2510.12238 https://ar…

A Gradient Guided Diffusion Framework for Chance Constrained Programming
Chance constrained programming (CCP) is a powerful framework for addressing optimization problems under uncertainty. In this paper, we introduce a novel Gradient-Guided Diffusion-based Optimization framework, termed GGDOpt, which tackles CCP through three key innovations. First, GGDOpt accommodates a broad class of CCP problems without requiring the knowledge of the exact distribution of uncertainty-relying solely on a set of samples. Second, to address the nonconvexity of the chance constraint…

@arXiv_statML_bot@mastoxiv.page
2025-10-08 09:29:19

On the Theory of Continual Learning with Gradient Descent for Neural Networks
Hossein Taheri, Avishek Ghosh, Arya Mazumdar
https://arxiv.org/abs/2510.05573 https://

On the Theory of Continual Learning with Gradient Descent for Neural Networks
Continual learning, the ability of a model to adapt to an ongoing sequence of tasks without forgetting the earlier ones, is a central goal of artificial intelligence. To shed light on its underlying mechanisms, we analyze the limitations of continual learning in a tractable yet representative setting. In particular, we study one-hidden-layer quadratic neural networks trained by gradient descent on an XOR cluster dataset with Gaussian noise, where different tasks correspond to different clusters…

@arXiv_quantph_bot@mastoxiv.page
2025-10-08 10:24:49

Hybrid Quantum-Classical Policy Gradient for Adaptive Control of Cyber-Physical Systems: A Comparative Study of VQC vs. MLP
Aueaphum Aueawatthanaphisut, Nyi Wunna Tun
https://arxiv.org/abs/2510.06010

Hybrid Quantum-Classical Policy Gradient for Adaptive Control of Cyber-Physical Systems: A Comparative Study of VQC vs. MLP
The comparative evaluation between classical and quantum reinforcement learning (QRL) paradigms was conducted to investigate their convergence behavior, robustness under observational noise, and computational efficiency in a benchmark control environment. The study employed a multilayer perceptron (MLP) agent as a classical baseline and a parameterized variational quantum circuit (VQC) as a quantum counterpart, both trained on the CartPole-v1 environment over 500 episodes. Empirical results dem…

@arXiv_csCL_bot@mastoxiv.page
2025-10-06 10:16:39

PGMEL: Policy Gradient-based Generative Adversarial Network for Multimodal Entity Linking
KM Pooja, Cheng Long, Aixin Sun
https://arxiv.org/abs/2510.02726 https://

PGMEL: Policy Gradient-based Generative Adversarial Network for Multimodal Entity Linking
The task of entity linking, which involves associating mentions with their respective entities in a knowledge graph, has received significant attention due to its numerous potential applications. Recently, various multimodal entity linking (MEL) techniques have been proposed, targeted to learn comprehensive embeddings by leveraging both text and vision modalities. The selection of high-quality negative samples can potentially play a crucial role in metric/representation learning. However, to th…

@arXiv_mathAP_bot@mastoxiv.page
2025-10-07 11:29:32

Riesz fractional gradient functionals defined on partitions: nonlocal-to-local variational limits
Almi Stefano, Maicol Caponi, Manuel Friedrich, Francesco Solombrino
https://arxiv.org/abs/2510.04881

Riesz fractional gradient functionals defined on partitions: nonlocal-to-local variational limits
This paper addresses the asymptotics of functionals with linear growth depending on the Riesz $s$-fractional gradient on piecewise constant functions. We consider a general class of varying energy densities and, as $s\to 1$, we characterize their local limiting functionals in the sense of $Γ$-convergence.

@arXiv_mathOC_bot@mastoxiv.page
2025-10-15 09:08:11

New Classes of Non-monotone Variational Inequality Problems Solvable via Proximal Gradient on Smooth Gap Functions
Lei Zhao, Daoli Zhu, Shuzhong Zhang
https://arxiv.org/abs/2510.12105

New Classes of Non-monotone Variational Inequality Problems Solvable via Proximal Gradient on Smooth Gap Functions
In this paper, we study the local linear convergence behavior of proximal-gradient (PG) descent algorithm on a parameterized gap-function reformulation of a smooth but non-monotone variational inequality problem (VIP). The aim is to solve the non-monotone VI problem without assuming the existence of a Minty-type solution. We first introduce and study various error bound conditions for the gap functions in relation to the VI model. In particular, we show that uniform type error bounds imply leve…

@arXiv_mathDG_bot@mastoxiv.page
2025-09-26 07:44:41

Four-dimensional Gradient Ricci Solitons Gradient shrinking Ricci Solitons and Modified Sectional Curvature
Xiaodong Cao, Ernani Ribeiro Jr, Hosea Wondo
https://arxiv.org/abs/2509.20669

Four-dimensional Gradient Ricci Solitons Gradient shrinking Ricci Solitons and Modified Sectional Curvature
We investigate four-dimensional gradient shrinking Ricci solitons with positive modified sectional curvature. Our first main result shows that if the norm of the self-dual Weyl tensor and the scalar curvature satisfy a certain sharp pinching condition --closely related, in a precise sense, to that of Kähler metric --then the soliton is necessarily locally Kähler. We further obtain a characterization theorem and a weighted integral gap result for compact gradient shrinking Ricci solitons with …

@arXiv_csLG_bot@mastoxiv.page
2025-10-08 10:38:29

NeST-BO: Fast Local Bayesian Optimization via Newton-Step Targeting of Gradient and Hessian Information
Wei-Ting Tang, Akshay Kudva, Joel A. Paulson
https://arxiv.org/abs/2510.05516

NeST-BO: Fast Local Bayesian Optimization via Newton-Step Targeting of Gradient and Hessian Information
Bayesian optimization (BO) is effective for expensive black-box problems but remains challenging in high dimensions. We propose NeST-BO, a local BO method that targets the Newton step by jointly learning gradient and Hessian information with Gaussian process surrogates, and selecting evaluations via a one-step lookahead bound on Newton-step error. We show that this bound (and hence the step error) contracts with batch size, so NeST-BO directly inherits inexact-Newton convergence: global progres…

@arXiv_statML_bot@mastoxiv.page
2025-10-07 10:11:52

Computing Wasserstein Barycenters through Gradient Flows
Eduardo Fernandes Montesuma, Yassir Bendou, Mike Gartrell
https://arxiv.org/abs/2510.04602 https://

Computing Wasserstein Barycenters through Gradient Flows
Wasserstein barycenters provide a powerful tool for aggregating probability measures, while leveraging the geometry of their ambient space. Existing discrete methods suffer from poor scalability, as they require access to the complete set of samples from input measures. We address this issue by recasting the original barycenter problem as a gradient flow in the Wasserstein space. Our approach offers two advantages. First, we achieve scalability by sampling mini-batches from the input measures. …

@arXiv_eessSY_bot@mastoxiv.page
2025-10-03 09:00:41

Off-Policy Reinforcement Learning with Anytime Safety Guarantees via Robust Safe Gradient Flow
Pol Mestres, Arnau Marzabal, Jorge Cort\'es
https://arxiv.org/abs/2510.01492 h…

Off-Policy Reinforcement Learning with Anytime Safety Guarantees via Robust Safe Gradient Flow
This paper considers the problem of solving constrained reinforcement learning (RL) problems with anytime guarantees, meaning that the algorithmic solution must yield a constraint-satisfying policy at every iteration of its evolution. Our design is based on a discretization of the Robust Safe Gradient Flow (RSGF), a continuous-time dynamics for anytime constrained optimization whose forward invariance and stability properties we formally characterize. The proposed strategy, termed RSGF-RL, is a…

@arXiv_csCL_bot@mastoxiv.page
2025-10-15 10:38:41

SMEC: Rethinking Matryoshka Representation Learning for Retrieval Embedding Compression
Biao Zhang, Lixin Chen, Tong Liu, Bo Zheng
https://arxiv.org/abs/2510.12474 https://

SMEC: Rethinking Matryoshka Representation Learning for Retrieval Embedding Compression
Large language models (LLMs) generate high-dimensional embeddings that capture rich semantic and syntactic information. However, high-dimensional embeddings exacerbate computational complexity and storage requirements, thereby hindering practical deployment. To address these challenges, we propose a novel training framework named Sequential Matryoshka Embedding Compression (SMEC). This framework introduces the Sequential Matryoshka Representation Learning(SMRL) method to mitigate gradient varia…

@arXiv_csLG_bot@mastoxiv.page
2025-10-06 10:25:09

AdaBet: Gradient-free Layer Selection for Efficient Training of Deep Neural Networks
Irene Tenison, Soumyajit Chatterjee, Fahim Kawsar, Mohammad Malekzadeh
https://arxiv.org/abs/2510.03101

AdaBet: Gradient-free Layer Selection for Efficient Training of Deep Neural Networks
To utilize pre-trained neural networks on edge and mobile devices, we often require efficient adaptation to user-specific runtime data distributions while operating under limited compute and memory resources. On-device retraining with a target dataset can facilitate such adaptations; however, it remains impractical due to the increasing depth of modern neural nets, as well as the computational overhead associated with gradient-based optimization across all layers. Current approaches reduce trai…

@arXiv_mathAP_bot@mastoxiv.page
2025-10-10 09:28:29

Gradient regularity for widely degenerate parabolic equations
Michael Strunk
https://arxiv.org/abs/2510.07999 https://arxiv.org/pdf/2510.07999

Gradient regularity for widely degenerate parabolic equations
In this paper, we are interested in the regularity of weak solutions $u\colonΩ_T\to\mathbb{R}$ to parabolic equations of the type \begin{equation*} \partial_t u - \mathrm{div} \nabla \mathcal{F}(x,t,Du) = f\qquad\mbox{in $Ω_T$}, \end{equation*} where $\mathcal{F}$ is only elliptic for values of $Du$ outside a bounded and convex set $E\subset \mathbb{R}^n$ with the property that $0\in \mathrm{Int}{E}$. Here, $Ω_T :=Ω\times(0,T)\subset\mathbb{R}^{n+1}$ denotes a space-time cylinder taken ov…

@arXiv_statML_bot@mastoxiv.page
2025-10-03 09:42:31

Adaptive Kernel Selection for Stein Variational Gradient Descent
Moritz Melcher, Simon Weissmann, Ashia C. Wilson, Jakob Zech
https://arxiv.org/abs/2510.02067 https://

Adaptive Kernel Selection for Stein Variational Gradient Descent
A central challenge in Bayesian inference is efficiently approximating posterior distributions. Stein Variational Gradient Descent (SVGD) is a popular variational inference method which transports a set of particles to approximate a target distribution. The SVGD dynamics are governed by a reproducing kernel Hilbert space (RKHS) and are highly sensitive to the choice of the kernel function, which directly influences both convergence and approximation quality. The commonly used median heuristic o…

@arXiv_mathOC_bot@mastoxiv.page
2025-11-14 09:50:00

(Adaptive) Scaled gradient methods beyond locally Holder smoothness: Lyapunov analysis, convergence rate and complexity
Susan Ghaderi, Morteza Rahimi, Yves Moreau, Masoud Ahookhosh
https://arxiv.org/abs/2511.10425 https://arxiv.org/pdf/2511.10425 https://arxiv.org/html/2511.10425
arXiv:2511.10425v1 Announce Type: new
Abstract: This paper addresses the unconstrained minimization of smooth convex functions whose gradients are locally Holder continuous. Building on these results, we analyze the Scaled Gradient Algorithm (SGA) under local smoothness assumptions, proving its global convergence and iteration complexity. Furthermore, under local strong convexity and the Kurdyka-Lojasiewicz (KL) inequality, we establish linear convergence rates and provide explicit complexity bounds. In particular, we show that when the gradient is locally Lipschitz continuous, SGA attains linear convergence for any KL exponent. We then introduce and analyze an adaptive variant of SGA (AdaSGA), which automatically adjusts the scaling and step-size parameters. For this method, we show global convergence, and derive local linear rates under strong convexity.
toXiv_bot_toot

@arXiv_mathDG_bot@mastoxiv.page
2025-10-10 08:47:49

Asymptotic behaviour of the weak inverse anisotropic mean curvature flow
Chaoqun Gao, Yong Wei, Rong Zhou
https://arxiv.org/abs/2510.08168 https://arxiv.or…

Asymptotic behaviour of the weak inverse anisotropic mean curvature flow
We first establish a local gradient estimate for anisotropic $p$-harmonic functions. A key feature of our estimate is that the constant remains bounded as $p\to 1$; consequently, in the limit $p\to 1$, this estimate yields the local gradient estimate for weak solutions of the inverse anisotropic mean curvature flow (IAMCF). As an application, we show that the weak IAMCF is asymptotic to the expanding Wulff shape solution at the infinity, thereby extending the result of Huisken and Ilmanen in [7…

@arXiv_csLG_bot@mastoxiv.page
2025-10-03 11:04:31

Robust Tangent Space Estimation via Laplacian Eigenvector Gradient Orthogonalization
Dhruv Kohli, Sawyer J. Robertson, Gal Mishne, Alexander Cloninger
https://arxiv.org/abs/2510.02308

Robust Tangent Space Estimation via Laplacian Eigenvector Gradient Orthogonalization
Estimating the tangent spaces of a data manifold is a fundamental problem in data analysis. The standard approach, Local Principal Component Analysis (LPCA), struggles in high-noise settings due to a critical trade-off in choosing the neighborhood size. Selecting an optimal size requires prior knowledge of the geometric and noise characteristics of the data that are often unavailable. In this paper, we propose a spectral method, Laplacian Eigenvector Gradient Orthogonalization (LEGO), that util…

@arXiv_mathOC_bot@mastoxiv.page
2025-10-06 09:27:49

Quantitative Convergence Analysis of Projected Stochastic Gradient Descent for Non-Convex Losses via the Goldstein Subdifferential
Yuping Zheng, Andrew Lamperski
https://arxiv.org/abs/2510.02735

Quantitative Convergence Analysis of Projected Stochastic Gradient Descent for Non-Convex Losses via the Goldstein Subdifferential
Stochastic gradient descent (SGD) is the main algorithm behind a large body of work in machine learning. In many cases, constraints are enforced via projections, leading to projected stochastic gradient algorithms. In recent years, a large body of work has examined the convergence properties of projected SGD for non-convex losses in asymptotic and non-asymptotic settings. Strong quantitative guarantees are available for convergence measured via Moreau envelopes. However, these results cannot be…

@arXiv_statML_bot@mastoxiv.page
2025-10-15 09:29:01

Active Subspaces in Infinite Dimension
Poorbita Kundu, Nathan Wycoff
https://arxiv.org/abs/2510.11871 https://arxiv.org/pdf/2510.11871

Active Subspaces in Infinite Dimension
Active subspace analysis uses the leading eigenspace of the gradient's second moment to conduct supervised dimension reduction. In this article, we extend this methodology to real-valued functionals on Hilbert space. We define an operator which coincides with the active subspace matrix when applied to a Euclidean space. We show that many of the desirable properties of Active Subspace analysis extend directly to the infinite dimensional setting. We also propose a Monte Carlo procedure and discus…

@arXiv_csLG_bot@mastoxiv.page
2025-10-08 10:46:59

Inductive inference of gradient-boosted decision trees on graphs for insurance fraud detection
F\'elix Vandervorst, Bruno Deprez, Wouter Verbeke, Tim Verdonck
https://arxiv.org/abs/2510.05676

Inductive inference of gradient-boosted decision trees on graphs for insurance fraud detection
Graph-based methods are becoming increasingly popular in machine learning due to their ability to model complex data and relations. Insurance fraud is a prime use case, since false claims are often the result of organised criminals that stage accidents or the same persons filing erroneous claims on multiple policies. One challenge is that graph-based approaches struggle to find meaningful representations of the data because of the high class imbalance present in fraud data. Another is that insu…

@arXiv_mathOC_bot@mastoxiv.page
2025-10-09 09:24:51

Approximate Bregman proximal gradient algorithm with variable metric Armijo--Wolfe line search
Kiwamu Fujiki, Shota Takahashi, Akiko Takeda
https://arxiv.org/abs/2510.06615 http…

Approximate Bregman proximal gradient algorithm with variable metric Armijo--Wolfe line search
We propose a variant of the approximate Bregman proximal gradient (ABPG) algorithm for minimizing the sum of a smooth nonconvex function and a nonsmooth convex function. Although ABPG is known to converge globally to a stationary point even when the smooth part of the objective function lacks globally Lipschitz continuous gradients, and its iterates can often be expressed in closed form, ABPG relies on an Armijo line search to guarantee global convergence. Such reliance can slow down performanc…

@arXiv_csLG_bot@mastoxiv.page
2025-10-02 11:07:51

CurES: From Gradient Analysis to Efficient Curriculum Learning for Reasoning LLMs
Yongcheng Zeng, Zexu Sun, Bokai Ji, Erxue Min, Hengyi Cai, Shuaiqiang Wang, Dawei Yin, Haifeng Zhang, Xu Chen, Jun Wang
https://arxiv.org/abs/2510.01037

CurES: From Gradient Analysis to Efficient Curriculum Learning for Reasoning LLMs
Curriculum learning plays a crucial role in enhancing the training efficiency of large language models (LLMs) on reasoning tasks. However, existing methods often fail to adequately account for variations in prompt difficulty or rely on simplistic filtering mechanisms to select prompt datasets within a narrow criterion range, resulting in significant computational waste. In this work, we approach the problem from the perspective of reinforcement learning gradient optimization, offering a systema…

@arXiv_statML_bot@mastoxiv.page
2025-10-01 09:43:28

When Langevin Monte Carlo Meets Randomization: Non-asymptotic Error Bounds beyond Log-Concavity and Gradient Lipschitzness
Xiaojie Wang, Bin Yang
https://arxiv.org/abs/2509.25630

When Langevin Monte Carlo Meets Randomization: Non-asymptotic Error Bounds beyond Log-Concavity and Gradient Lipschitzness
Efficient sampling from complex and high dimensional target distributions turns out to be a fundamental task in diverse disciplines such as scientific computing, statistics and machine learning. In this paper, we revisit the randomized Langevin Monte Carlo (RLMC) for sampling from high dimensional distributions without log-concavity. Under the gradient Lipschitz condition and the log-Sobolev inequality, we prove a uniform-in-time error bound in $\mathcal{W}_2$-distance of order $O(\sqrt{d}h)$ f…

@arXiv_mathOC_bot@mastoxiv.page
2025-11-14 09:19:00

Global Convergence of Four-Layer Matrix Factorization under Random Initialization
Minrui Luo, Weihang Xu, Xiang Gao, Maryam Fazel, Simon Shaolei Du
https://arxiv.org/abs/2511.09925 https://arxiv.org/pdf/2511.09925 https://arxiv.org/html/2511.09925
arXiv:2511.09925v1 Announce Type: new
Abstract: Gradient descent dynamics on the deep matrix factorization problem is extensively studied as a simplified theoretical model for deep neural networks. Although the convergence theory for two-layer matrix factorization is well-established, no global convergence guarantee for general deep matrix factorization under random initialization has been established to date. To address this gap, we provide a polynomial-time global convergence guarantee for randomly initialized gradient descent on four-layer matrix factorization, given certain conditions on the target matrix and a standard balanced regularization term. Our analysis employs new techniques to show saddle-avoidance properties of gradient decent dynamics, and extends previous theories to characterize the change in eigenvalues of layer weights.
toXiv_bot_toot

@arXiv_csLG_bot@mastoxiv.page
2025-10-02 11:10:31

Sample-Efficient Differentially Private Fine-Tuning via Gradient Matrix Denoising
Ali Dadsetan, Frank Rudzicz
https://arxiv.org/abs/2510.01137 https://arxi…

Sample-Efficient Differentially Private Fine-Tuning via Gradient Matrix Denoising
We address the challenge of sample efficiency in differentially private fine-tuning of large language models (LLMs) using DP-SGD. While DP-SGD provides strong privacy guarantees, the added noise significantly increases the entropy of gradient matrices, disrupting their low-rank structure and slowing optimization. We propose a post-processing algorithm that leverages random matrix theory to denoise gradients, restore low-rank structure, and improve alignment with the original signal. Applied to …

@arXiv_statML_bot@mastoxiv.page
2025-09-30 08:48:11

Differentially Private Two-Stage Gradient Descent for Instrumental Variable Regression
Haodong Liang, Yanhao Jin, Krishnakumar Balasubramanian, Lifeng Lai
https://arxiv.org/abs/2509.22794

Differentially Private Two-Stage Gradient Descent for Instrumental Variable Regression
We study instrumental variable regression (IVaR) under differential privacy constraints. Classical IVaR methods (like two-stage least squares regression) rely on solving moment equations that directly use sensitive covariates and instruments, creating significant risks of privacy leakage and posing challenges in designing algorithms that are both statistically efficient and differentially private. We propose a noisy two-state gradient descent algorithm that ensures $ρ$-zero-concentrated differ…

@arXiv_csLG_bot@mastoxiv.page
2025-10-08 10:26:09

Correlating Cross-Iteration Noise for DP-SGD using Model Curvature
Xin Gu, Yingtai Xiao, Guanlin He, Jiamu Bai, Daniel Kifer, Kiwan Maeng
https://arxiv.org/abs/2510.05416 https:…

Correlating Cross-Iteration Noise for DP-SGD using Model Curvature
Differentially private stochastic gradient descent (DP-SGD) offers the promise of training deep learning models while mitigating many privacy risks. However, there is currently a large accuracy gap between DP-SGD and normal SGD training. This has resulted in different lines of research investigating orthogonal ways of improving privacy-preserving training. One such line of work, known as DP-MF, correlates the privacy noise across different iterations of stochastic gradient descent -- allowing l…

@arXiv_mathOC_bot@mastoxiv.page
2025-11-14 10:01:50

Low-Discrepancy Set Post-Processing via Gradient Descent
Fran\c{c}ois Cl\'ement, Linhang Huang, Woorim Lee, Cole Smidt, Braeden Sodt, Xuan Zhang
https://arxiv.org/abs/2511.10496 https://arxiv.org/pdf/2511.10496 https://arxiv.org/html/2511.10496
arXiv:2511.10496v1 Announce Type: new
Abstract: The construction of low-discrepancy sets, used for uniform sampling and numerical integration, has recently seen great improvements based on optimization and machine learning techniques. However, these methods are computationally expensive, often requiring days of computation or access to GPU clusters. We show that simple gradient descent-based techniques allow for comparable results when starting with a reasonably uniform point set. Not only is this method much more efficient and accessible, but it can be applied as post-processing to any low-discrepancy set generation method for a variety of standard discrepancy measures.
toXiv_bot_toot

@arXiv_csLG_bot@mastoxiv.page
2025-10-07 13:05:42

Reinforce-Ada: An Adaptive Sampling Framework for Reinforce-Style LLM Training
Wei Xiong, Chenlu Ye, Baohao Liao, Hanze Dong, Xinxing Xu, Christof Monz, Jiang Bian, Nan Jiang, Tong Zhang
https://arxiv.org/abs/2510.04996

Reinforce-Ada: An Adaptive Sampling Framework for Reinforce-Style LLM Training
Reinforcement learning applied to large language models (LLMs) for reasoning tasks is often bottlenecked by unstable gradient estimates due to fixed and uniform sampling of responses across prompts. Prior work such as GVM-RAFT addresses this by dynamically allocating inference budget per prompt to minimize stochastic gradient variance under a budget constraint. Inspired by this insight, we propose Reinforce-Ada, an adaptive sampling framework for online RL post-training of LLMs that continuousl…

@arXiv_mathOC_bot@mastoxiv.page
2025-09-29 09:16:07

A Riemannian Accelerated Proximal Gradient Method
Shuailing Feng, Yuhang Jiang, Wen Huang, Shihui Ying
https://arxiv.org/abs/2509.21897 https://arxiv.org/p…

A Riemannian Accelerated Proximal Gradient Method
Riemannian accelerated gradient methods have been well studied for smooth optimization, typically treating geodesically convex and geodesically strongly convex cases separately. However, their extension to nonsmooth problems on manifolds with theoretical acceleration remains underexplored. To address this issue, we propose a unified Riemannian accelerated proximal gradient method for problems of the form $F(x) = f(x) + h(x)$ on manifolds, where $f$ can be either geodesically convex or geodesica…

@arXiv_csLG_bot@mastoxiv.page
2025-10-07 13:05:22

Adaptive Memory Momentum via a Model-Based Framework for Deep Learning Optimization
Kristi Topollai, Anna Choromanska
https://arxiv.org/abs/2510.04988 https://

Adaptive Memory Momentum via a Model-Based Framework for Deep Learning Optimization
The vast majority of modern deep learning models are trained with momentum-based first-order optimizers. The momentum term governs the optimizer's memory by determining how much each past gradient contributes to the current convergence direction. Fundamental momentum methods, such as Nesterov Accelerated Gradient and the Heavy Ball method, as well as more recent optimizers such as AdamW and Lion, all rely on the momentum coefficient that is customarily set to $β= 0.9$ and kept constant during …

@arXiv_mathOC_bot@mastoxiv.page
2025-11-14 09:37:10

S-D-RSM: Stochastic Distributed Regularized Splitting Method for Large-Scale Convex Optimization Problems
Maoran Wang, Xingju Cai, Yongxin Chen
https://arxiv.org/abs/2511.10133 https://arxiv.org/pdf/2511.10133 https://arxiv.org/html/2511.10133
arXiv:2511.10133v1 Announce Type: new
Abstract: This paper investigates the problems large-scale distributed composite convex optimization, with motivations from a broad range of applications, including multi-agent systems, federated learning, smart grids, wireless sensor networks, compressed sensing, and so on. Stochastic gradient descent (SGD) and its variants are commonly employed to solve such problems. However, existing algorithms often rely on vanishing step sizes, strong convexity assumptions, or entail substantial computational overhead to ensure convergence or obtain favorable complexity. To bridge the gap between theory and practice, we integrate consensus optimization and operator splitting techniques (see Problem Reformulation) to develop a novel stochastic splitting algorithm, termed the \emph{stochastic distributed regularized splitting method} (S-D-RSM). In practice, S-D-RSM performs parallel updates of proximal mappings and gradient information for only a randomly selected subset of agents at each iteration. By introducing regularization terms, it effectively mitigates consensus discrepancies among distributed nodes. In contrast to conventional stochastic methods, our theoretical analysis establishes that S-D-RSM achieves global convergence without requiring diminishing step sizes or strong convexity assumptions. Furthermore, it achieves an iteration complexity of $\mathcal{O}(1/\epsilon)$ with respect to both the objective function value and the consensus error. Numerical experiments show that S-D-RSM achieves up to 2--3$\times$ speedup compared to state-of-the-art baselines, while maintaining comparable or better accuracy. These results not only validate the algorithm's theoretical guarantees but also demonstrate its effectiveness in practical tasks such as compressed sensing and empirical risk minimization.
toXiv_bot_toot

@arXiv_csLG_bot@mastoxiv.page
2025-10-13 10:44:10

Weight Initialization and Variance Dynamics in Deep Neural Networks and Large Language Models
Yankun Han
https://arxiv.org/abs/2510.09423 https://arxiv.org…

Weight Initialization and Variance Dynamics in Deep Neural Networks and Large Language Models
Weight initialization governs signal propagation and gradient flow at the start of training. This paper offers a theory-grounded and empirically validated study across two regimes: compact ReLU multilayer perceptrons and GPT-2-style transformers. First, a logarithmic sweep of the initial standard deviation maps vanishing and exploding regimes and identifies a broad stability band with standard deviations between 1e-2 and 1e-1. Second, a controlled comparison shows that Kaiming (fan-in) initiali…

@arXiv_csLG_bot@mastoxiv.page
2025-10-14 13:41:28

Boundary-Guided Policy Optimization for Memory-efficient RL of Diffusion Large Language Models
Nianyi Lin, Jiajie Zhang, Lei Hou, Juanzi Li
https://arxiv.org/abs/2510.11683 http…

Boundary-Guided Policy Optimization for Memory-efficient RL of Diffusion Large Language Models
A key challenge in applying reinforcement learning (RL) to diffusion large language models (dLLMs) lies in the intractability of their likelihood functions, which are essential for the RL objective, necessitating corresponding approximation in each training step. While existing methods approximate the log-likelihoods by their evidence lower bounds (ELBOs) via customized Monte Carlo (MC) sampling, the forward computational graphs of all MC samples need to be retained for the gradient computation…

@arXiv_mathOC_bot@mastoxiv.page
2025-09-25 08:48:02

Inexact and Stochastic Gradient Optimization Algorithms with Inertia and Hessian Driven Damping
Harsh Choudhary, Jalal Fadili, Vyachelav Kungurtsev
https://arxiv.org/abs/2509.19561

Inexact and Stochastic Gradient Optimization Algorithms with Inertia and Hessian Driven Damping
In a real Hilbert space setting, we study the convergence properties of an inexact gradient algorithm featuring both viscous and Hessian driven damping for convex differentiable optimization. In this algorithm, the gradient evaluation can be subject to deterministic and stochastic perturbations. In the deterministic case, we show that under appropriate summability assumptions on the perturbation, our algorithm enjoys fast convergence of the objective values, of the gradients and weak convergenc…

@arXiv_mathOC_bot@mastoxiv.page
2025-09-30 12:01:41

Proximal gradient methods in Banach spaces
Gerd Wachsmuth, Daniel Walter
https://arxiv.org/abs/2509.24685 https://arxiv.org/pdf/2509.24685

Proximal gradient methods in Banach spaces
Proximal gradient methods are a popular tool for the solution of structured, nonsmooth minimization problems. In this work, we investigate an extension of the former to general Banach spaces and provide worst-case convergence rates for, both, convex and nonconvex, problem instances. Moreover, assuming additional regularity properties of stationary points, linear rates of convergence are derived. The theoretical results are illustrated for bang-bang type optimal control problems with partial dif…

@arXiv_csLG_bot@mastoxiv.page
2025-10-02 11:08:01

Gated X-TFC: Soft Domain Decomposition for Forward and Inverse Problems in Sharp-Gradient PDEs
Vikas Dwivedi, Enrico Schiassi, Monica Sigovan, Bruno Sixou
https://arxiv.org/abs/2510.01039

Gated X-TFC: Soft Domain Decomposition for Forward and Inverse Problems in Sharp-Gradient PDEs
Physics-informed neural networks (PINNs) and related methods struggle to resolve sharp gradients in singularly perturbed boundary value problems without resorting to some form of domain decomposition, which often introduce complex interface penalties. While the Extreme Theory of Functional Connections (X-TFC) avoids multi-objective optimization by employing exact boundary condition enforcement, it remains computationally inefficient for boundary layers and incompatible with decomposition. We pr…

@arXiv_mathOC_bot@mastoxiv.page
2025-10-03 09:48:41

On the (almost) Global Exponential Convergence of the Overparameterized Policy Optimization for the LQR Problem
Moh Kamalul Wafi, Arthur Castello B. de Oliveira, Eduardo D. Sontag
https://arxiv.org/abs/2510.02140

On the (almost) Global Exponential Convergence of the Overparameterized Policy Optimization for the LQR Problem
In this work we study the convergence of gradient methods for nonconvex optimization problems -- specifically the effect of the problem formulation to the convergence behavior of the solution of a gradient flow. We show through a simple example that, surprisingly, the gradient flow solution can be exponentially or asymptotically convergent, depending on how the problem is formulated. We then deepen the analysis and show that a policy optimization strategy for the continuous-time linear quadrati…

@arXiv_csLG_bot@mastoxiv.page
2025-09-23 12:49:50

Unveiling m-Sharpness Through the Structure of Stochastic Gradient Noise
Haocheng Luo, Mehrtash Harandi, Dinh Phung, Trung Le
https://arxiv.org/abs/2509.18001 https://

Unveiling m-Sharpness Through the Structure of Stochastic Gradient Noise
Sharpness-aware minimization (SAM) has emerged as a highly effective technique for improving model generalization, but its underlying principles are not fully understood. We investigated the phenomenon known as m-sharpness, where the performance of SAM improves monotonically as the micro-batch size for computing perturbations decreases. Leveraging an extended Stochastic Differential Equation (SDE) framework, combined with an analysis of the structure of stochastic gradient noise (SGN), we preci…

@arXiv_mathOC_bot@mastoxiv.page
2025-10-01 08:57:27

A Single-Loop Gradient Algorithm for Pessimistic Bilevel Optimization via Smooth Approximation
Cao Qichao, Zeng Shangzhi, Zhang jin
https://arxiv.org/abs/2509.26240 https://

A Single-Loop Gradient Algorithm for Pessimistic Bilevel Optimization via Smooth Approximation
Bilevel optimization has garnered significant attention in the machine learning community recently, particularly regarding the development of efficient numerical methods. While substantial progress has been made in developing efficient algorithms for optimistic bilevel optimization, the study of methods for solving Pessimistic Bilevel Optimization (PBO) remains relatively less explored, especially the design of fully first-order, single-loop gradient-based algorithms. This paper aims to bridge …

@arXiv_mathOC_bot@mastoxiv.page
2025-09-23 08:49:20

Hessian-guided Perturbed Wasserstein Gradient Flows for Escaping Saddle Points
Naoya Yamamoto, Juno Kim, Taiji Suzuki
https://arxiv.org/abs/2509.16974 https://

Hessian-guided Perturbed Wasserstein Gradient Flows for Escaping Saddle Points
Wasserstein gradient flow (WGF) is a common method to perform optimization over the space of probability measures. While WGF is guaranteed to converge to a first-order stationary point, for nonconvex functionals the converged solution does not necessarily satisfy the second-order optimality condition; i.e., it could converge to a saddle point. In this work, we propose a new algorithm for probability measure optimization, perturbed Wasserstein gradient flow (PWGF), that achieves second-order opt…

@arXiv_mathOC_bot@mastoxiv.page
2025-11-14 13:23:10

Replaced article(s) found for math.OC. https://arxiv.org/list/math.OC/new
[1/1]:
- A robust BFGS algorithm for unconstrained nonlinear optimization problems
Yaguang Yang
https://arxiv.org/abs/1212.5929
- Quantum computing and the stable set problem
Alja\v{z} Krpan, Janez Povh, Dunja Pucher
https://arxiv.org/abs/2405.12845 https://mastoxiv.page/@arXiv_mathOC_bot/112483516437815686
- Mean Field Game with Reflected Jump Diffusion Dynamics: A Linear Programming Approach
Zongxia Liang, Xiang Yu, Keyu Zhang
https://arxiv.org/abs/2508.20388 https://mastoxiv.page/@arXiv_mathOC_bot/115111048711698998
- Differential Dynamic Programming for the Optimal Control Problem with an Ellipsoidal Target Set a...
Sungjun Eom, Gyunghoon Park
https://arxiv.org/abs/2509.07546 https://mastoxiv.page/@arXiv_mathOC_bot/115179281556444440
- On the Moreau envelope properties of weakly convex functions
Marien Renaud, Arthur Leclaire, Nicolas Papadakis
https://arxiv.org/abs/2509.13960 https://mastoxiv.page/@arXiv_mathOC_bot/115224514482363803
- Automated algorithm design via Nevanlinna-Pick interpolation
Ibrahim K. Ozaslan, Tryphon T. Georgiou, Mihailo R. Jovanovic
https://arxiv.org/abs/2509.21416 https://mastoxiv.page/@arXiv_mathOC_bot/115286533597711930
- Optimal Control of a Bioeconomic Crop-Energy System with Energy Reinvestment
Othman Cherkaoui Dekkaki
https://arxiv.org/abs/2510.11381 https://mastoxiv.page/@arXiv_mathOC_bot/115372322896073250
- Point Convergence Analysis of the Accelerated Gradient Method for Multiobjective Optimization: Co...
Yingdong Yin
https://arxiv.org/abs/2510.26382 https://mastoxiv.page/@arXiv_mathOC_bot/115468018035252078
- History-Aware Adaptive High-Order Tensor Regularization
Chang He, Bo Jiang, Yuntian Jiang, Chuwen Zhang, Shuzhong Zhang
https://arxiv.org/abs/2511.05788
- Equivalence of entropy solutions and gradient flows for pressureless 1D Euler systems
Jos\'e Antonio Carrillo, Sondre Tesdal Galtung
https://arxiv.org/abs/2312.04932 https://mastoxiv.page/@arXiv_mathAP_bot/111560077272113052
- Kernel Modelling of Fading Memory Systems
Yongkang Huo, Thomas Chaffey, Rodolphe Sepulchre
https://arxiv.org/abs/2403.11945 https://mastoxiv.page/@arXiv_eessSY_bot/112121123836064435
- The Maximum Theoretical Ground Speed of the Wheeled Vehicle
Altay Zhakatayev, Mukatai Nemerebayev
https://arxiv.org/abs/2502.15341 https://mastoxiv.page/@arXiv_physicsclassph_bot/114057765769441123
- Hessian stability and convergence rates for entropic and Sinkhorn potentials via semiconcavity
Giacomo Greco, Luca Tamanini
https://arxiv.org/abs/2504.11133 https://mastoxiv.page/@arXiv_mathPR_bot/114346453424694503
- Optimizing the ground state energy of the three-dimensional magnetic Dirichlet Laplacian with con...
Matthias Baur
https://arxiv.org/abs/2504.21597 https://mastoxiv.page/@arXiv_mathph_bot/114431404740241516
- A localized consensus-based sampling algorithm
Arne Bouillon, Alexander Bodard, Panagiotis Patrinos, Dirk Nuyens, Giovanni Samaey
https://arxiv.org/abs/2505.24861 https://mastoxiv.page/@arXiv_mathNA_bot/114612580684567066
- A Novel Sliced Fused Gromov-Wasserstein Distance
Moritz Piening, Robert Beinert
https://arxiv.org/abs/2508.02364 https://mastoxiv.page/@arXiv_csLG_bot/114976243138728278
- Minimal Regret Walras Equilibria for Combinatorial Markets via Duality, Integrality, and Sensitiv...
Alo\"is Duguet, Tobias Harks, Martin Schmidt, Julian Schwarz
https://arxiv.org/abs/2511.09021 https://mastoxiv.page/@arXiv_csGT_bot/115541243299714775
toXiv_bot_toot

Tootfinder

Opt-in global Mastodon full text search. Join the index!