Tootfinder

@arXiv_csLG_bot@mastoxiv.page
2025-08-29 10:18:31

Unbiased Stochastic Optimization for Gaussian Processes on Finite Dimensional RKHS
Neta Shoham, Haim Avron
https://arxiv.org/abs/2508.20588 https://arxiv.o…

Unbiased Stochastic Optimization for Gaussian Processes on Finite Dimensional RKHS
Current methods for stochastic hyperparameter learning in Gaussian Processes (GPs) rely on approximations, such as computing biased stochastic gradients or using inducing points in stochastic variational inference. However, when using such methods we are not guaranteed to converge to a stationary point of the true marginal likelihood. In this work, we propose algorithms for exact stochastic inference of GPs with kernels that induce a Reproducing Kernel Hilbert Space (RKHS) of moderate finite di…

@arXiv_csPF_bot@mastoxiv.page
2025-07-29 07:52:01

Towards Generalized Parameter Tuning in Coherent Ising Machines: A Portfolio-Based Approach
Tatsuro Hanyu, Takahiro Katagiri, Daichi Mukunoki, Tetsuya Hoshino
https://arxiv.org/abs/2507.20295

Towards Generalized Parameter Tuning in Coherent Ising Machines: A Portfolio-Based Approach
Coherent Ising Machines (CIMs) have recently gained attention as a promising computing model for solving combinatorial optimization problems. In particular, the Chaotic Amplitude Control (CAC) algorithm has demonstrated high solution quality, but its performance is highly sensitive to a large number of hyperparameters, making efficient tuning essential. In this study, we present an algorithm portfolio approach for hyperparameter tuning in CIMs employing Chaotic Amplitude Control with momentum (…

@arXiv_csSE_bot@mastoxiv.page
2025-07-22 10:56:00

Investigating the Role of LLMs Hyperparameter Tuning and Prompt Engineering to Support Domain Modeling
Vladyslav Bulhakov, Giordano d'Aloisio, Claudio Di Sipio, Antinisca Di Marco, Davide Di Ruscio
https://arxiv.org/abs/2507.14735

Investigating the Role of LLMs Hyperparameter Tuning and Prompt Engineering to Support Domain Modeling
The introduction of large language models (LLMs) has enhanced automation in software engineering tasks, including in Model Driven Engineering (MDE). However, using general-purpose LLMs for domain modeling has its limitations. One approach is to adopt fine-tuned models, but this requires significant computational resources and can lead to issues like catastrophic forgetting. This paper explores how hyperparameter tuning and prompt engineering can improve the accuracy of the Llama 3.1 model for…

@arXiv_csCR_bot@mastoxiv.page
2025-08-22 09:33:41

Private Hyperparameter Tuning with Ex-Post Guarantee
Badih Ghazi, Pritish Kamath, Alexander Knop, Ravi Kumar, Pasin Manurangsi, Chiyuan Zhang
https://arxiv.org/abs/2508.15183 ht…

Private Hyperparameter Tuning with Ex-Post Guarantee
The conventional approach in differential privacy (DP) literature formulates the privacy-utility trade-off with a "privacy-first" perspective: for a predetermined level of privacy, a certain utility is achievable. However, practitioners often operate under a "utility-first" paradigm, prioritizing a desired level of utility and then determining the corresponding privacy cost. Wu et al. [2019] initiated a formal study of this "utility-first" perspective by introducing ex-post DP. They demonstra…

@arXiv_csCL_bot@mastoxiv.page
2025-06-27 09:58:19

Bridging Offline and Online Reinforcement Learning for LLMs
Jack Lanchantin, Angelica Chen, Janice Lan, Xian Li, Swarnadeep Saha, Tianlu Wang, Jing Xu, Ping Yu, Weizhe Yuan, Jason E Weston, Sainbayar Sukhbaatar, Ilia Kulikov
https://arxiv.org/abs/2506.21495 https://arxiv.org/pdf/2506.21495 https://arxiv.org/html/2506.21495
arXiv:2506.21495v1 Announce Type: new
Abstract: We investigate the effectiveness of reinforcement learning methods for finetuning large language models when transitioning from offline to semi-online to fully online regimes for both verifiable and non-verifiable tasks. Our experiments cover training on verifiable math as well as non-verifiable instruction following with a set of benchmark evaluations for both. Across these settings, we extensively compare online and semi-online Direct Preference Optimization and Group Reward Policy Optimization objectives, and surprisingly find similar performance and convergence between these variants, which all strongly outperform offline methods. We provide a detailed analysis of the training dynamics and hyperparameter selection strategies to achieve optimal results. Finally, we show that multi-tasking with verifiable and non-verifiable rewards jointly yields improved performance across both task types.
toXiv_bot_toot

@arXiv_physicsoptics_bot@mastoxiv.page
2025-07-23 08:03:32

Hyperparameter-free minimum-lengthscale constraints for topology optimization
Rodrigo Arrieta, Giuseppe Romano, Steven G. Johnson
https://arxiv.org/abs/2507.16108

Hyperparameter-free minimum-lengthscale constraints for topology optimization
The geometric constraints of Zhou et al. (2015) are a widely used technique in topology/freeform optimization to impose minimum lengthscales for manufacturability. However, its efficacy degrades as design binarization is increased, and it requires heuristic tuning of multiple hyperparameters. In this work, we derive analytical hyperparameters from first principles, depending only on the target lengthscale. We present results for both conic and PDE-based filtering schemes, showing that the latte…

@arXiv_csET_bot@mastoxiv.page
2025-07-23 08:04:12

Quantum Annealing Hyperparameter Analysis for Optimal Sensor Placement in Production Environments
Nico Kraus, Marvin Erdmann, Alexander Kuzmany, Daniel Porawski, Jonas Stein
https://arxiv.org/abs/2507.16584

Quantum Annealing Hyperparameter Analysis for Optimal Sensor Placement in Production Environments
To increase efficiency in automotive manufacturing, newly produced vehicles can move autonomously from the production line to the distribution area. This requires an optimal placement of sensors to ensure full coverage while minimizing the number of sensors used. The underlying optimization problem poses a computational challenge due to its large-scale nature. Currently, classical solvers rely on heuristics, often yielding non-optimal solutions for large instances, resulting in suboptimal senso…

@arXiv_condmatstatmech_bot@mastoxiv.page
2025-07-25 08:52:22

Black-box optimization using factorization and Ising machines
Ryo Tamura, Yuya Seki, Yuki Minamoto, Koki Kitai, Yoshiki Matsuda, Shu Tanaka, Koji Tsuda
https://arxiv.org/abs/2507.18003

Black-box optimization using factorization and Ising machines
Black-box optimization (BBO) is used in materials design, drug discovery, and hyperparameter tuning in machine learning. The world is experiencing several of these problems. In this review, a factorization machine with quantum annealing or with quadratic-optimization annealing (FMQA) algorithm to realize fast computations of BBO using Ising machines (IMs) is discussed. The FMQA algorithm uses a factorization machine (FM) as a surrogate model for BBO. The FM model can be directly transformed int…

@arXiv_csCV_bot@mastoxiv.page
2025-08-22 10:08:21

From Linearity to Non-Linearity: How Masked Autoencoders Capture Spatial Correlations
Anthony Bisulco, Rahul Ramesh, Randall Balestriero, Pratik Chaudhari
https://arxiv.org/abs/2508.15404

From Linearity to Non-Linearity: How Masked Autoencoders Capture Spatial Correlations
Masked Autoencoders (MAEs) have emerged as a powerful pretraining technique for vision foundation models. Despite their effectiveness, they require extensive hyperparameter tuning (masking ratio, patch size, encoder/decoder layers) when applied to novel datasets. While prior theoretical works have analyzed MAEs in terms of their attention patterns and hierarchical latent variable models, the connection between MAE hyperparameters and performance on downstream tasks is relatively unexplored. Thi…

@arXiv_csLG_bot@mastoxiv.page
2025-08-25 10:01:50

Sparse but Wrong: Incorrect L0 Leads to Incorrect Features in Sparse Autoencoders
David Chanin, Adri\`a Garriga-Alonso
https://arxiv.org/abs/2508.16560 https://

Sparse but Wrong: Incorrect L0 Leads to Incorrect Features in Sparse Autoencoders
Sparse Autoencoders (SAEs) extract features from LLM internal activations, meant to correspond to single concepts. A core SAE training hyperparameter is L0: how many features should fire per token on average. Existing work compares SAE algorithms using sparsity--reconstruction tradeoff plots, implying L0 is a free parameter with no single correct value. In this work we study the effect of L0 on BatchTopK SAEs, and show that if L0 is not set precisely, the SAE fails to learn the underlying featu…

@arXiv_mathOC_bot@mastoxiv.page
2025-07-23 09:12:02

Learning Acceleration Algorithms for Fast Parametric Convex Optimization with Certified Robustness
Rajiv Sambharya, Jinho Bok, Nikolai Matni, George Pappas
https://arxiv.org/abs/2507.16264

Learning Acceleration Algorithms for Fast Parametric Convex Optimization with Certified Robustness
We develop a machine-learning framework to learn hyperparameter sequences for accelerated first-order methods (e.g., the step size and momentum sequences in accelerated gradient descent) to quickly solve parametric convex optimization problems with certified robustness. We obtain a strong form of robustness guarantee -- certification of worst-case performance over all parameters within a set after a given number of iterations -- through regularization-based training. The regularization term is …

@arXiv_statML_bot@mastoxiv.page
2025-08-21 08:10:59

Evaluation and Optimization of Leave-one-out Cross-validation for the Lasso
Ryan Burn
https://arxiv.org/abs/2508.14368 https://arxiv.org/pdf/2508.14368

Evaluation and Optimization of Leave-one-out Cross-validation for the Lasso
I develop an algorithm to produce the piecewise quadratic that computes leave-one-out cross-validation for the lasso as a function of its hyperparameter. The algorithm can be used to find exact hyperparameters that optimize leave-one-out cross-validation either globally or locally, and its practicality is demonstrated on real-world data sets.

@arXiv_csGR_bot@mastoxiv.page
2025-08-07 07:33:23

RLGS: Reinforcement Learning-Based Adaptive Hyperparameter Tuning for Gaussian Splatting
Zhan Li, Huangying Zhan, Changyang Li, Qingan Yan, Yi Xu
https://arxiv.org/abs/2508.04078

RLGS: Reinforcement Learning-Based Adaptive Hyperparameter Tuning for Gaussian Splatting
Hyperparameter tuning in 3D Gaussian Splatting (3DGS) is a labor-intensive and expert-driven process, often resulting in inconsistent reconstructions and suboptimal results. We propose RLGS, a plug-and-play reinforcement learning framework for adaptive hyperparameter tuning in 3DGS through lightweight policy modules, dynamically adjusting critical hyperparameters such as learning rates and densification thresholds. The framework is model-agnostic and seamlessly integrates into existing 3DGS pip…

@arXiv_csNE_bot@mastoxiv.page
2025-07-22 07:49:50

Analyzing Internal Activity and Robustness of SNNs Across Neuron Parameter Space
Szymon Mazurek, Jakub Caputa, Maciej Wielgosz
https://arxiv.org/abs/2507.14757

Analyzing Internal Activity and Robustness of SNNs Across Neuron Parameter Space
Spiking Neural Networks (SNNs) offer energy-efficient and biologically plausible alternatives to traditional artificial neural networks, but their performance depends critically on the tuning of neuron model parameters. In this work, we identify and characterize an operational space - a constrained region in the neuron hyperparameter domain (specifically membrane time constant tau and voltage threshold vth) - within which the network exhibits meaningful activity and functional behavior. Operati…

@arXiv_eessIV_bot@mastoxiv.page
2025-07-09 07:44:22

Dual-Attention U-Net with Class-Specific Ensembles and Bayesian Hyperparameter Optimization for Precise Wound and Scale Marker Segmentation
Daniel Cie\'slak, Miriam Reca, Olena Onyshchenko, Jacek Rumi\'nski
https://arxiv.org/abs/2507.05314

Dual-Attention U-Net++ with Class-Specific Ensembles and Bayesian Hyperparameter Optimization for Precise Wound and Scale Marker Segmentation
Accurate segmentation of wounds and scale markers in clinical images remainsa significant challenge, crucial for effective wound management and automatedassessment. In this study, we propose a novel dual-attention U-Net++ archi-tecture, integrating channel-wise (SCSE) and spatial attention mechanisms toaddress severe class imbalance and variability in medical images effectively.Initially, extensive benchmarking across diverse architectures and encoders via 5-fold cross-validation identified Eff…

@arXiv_csLG_bot@mastoxiv.page
2025-08-20 10:05:00

In-Context Decision Making for Optimizing Complex AutoML Pipelines
Amir Rezaei Balef, Katharina Eggensperger
https://arxiv.org/abs/2508.13657 https://arxiv…

In-Context Decision Making for Optimizing Complex AutoML Pipelines
Combined Algorithm Selection and Hyperparameter Optimization (CASH) has been fundamental to traditional AutoML systems. However, with the advancements of pre-trained models, modern ML workflows go beyond hyperparameter optimization and often require fine-tuning, ensembling, and other adaptation techniques. While the core challenge of identifying the best-performing model for a downstream task remains, the increasing heterogeneity of ML pipelines demands novel AutoML approaches. This work extend…

@arXiv_physicscompph_bot@mastoxiv.page
2025-06-23 09:13:00

Great Restraining Wall in Multidimentional Collective Variable Space
Zhijun Pan, Maodong Li, Dechin Chen, Yi Isaac Yang
https://arxiv.org/abs/2506.17043 ht…

Great Restraining Wall in Multidimentional Collective Variable Space
Enhanced sampling methods are pivotal for exploring rare events in molecular dynamics (MD), yet face challenges in high-dimensional collective variable (CV) spaces where exhaustive sampling becomes computationally prohibitive. While techniques like metadynamics (MetaD) and path-CV enable targeted free energy surface (FES) reconstruction, they often struggle with confinement stability, hyperparameter sensitivity, and geometric flexibility. This work introduces the Great Restraining Wall (GW) met…

@arXiv_csHC_bot@mastoxiv.page
2025-07-17 09:16:00

Dataset-Adaptive Dimensionality Reduction
Hyeon Jeon, Jeongin Park, Soohyun Lee, Dae Hyun Kim, Sungbok Shin, Jinwook Seo
https://arxiv.org/abs/2507.11984 h…

Dataset-Adaptive Dimensionality Reduction
Selecting the appropriate dimensionality reduction (DR) technique and determining its optimal hyperparameter settings that maximize the accuracy of the output projections typically involves extensive trial and error, often resulting in unnecessary computational overhead. To address this challenge, we propose a dataset-adaptive approach to DR optimization guided by structural complexity metrics. These metrics quantify the intrinsic complexity of a dataset, predicting whether higher-dimensional s…

@arXiv_hepph_bot@mastoxiv.page
2025-08-20 08:33:40

Harnessing data-driven methods for precise model independent event shape estimation in relativistic heavy-ion collisions
Dipankar Basak, H. Hushnud, Kalyan Dey
https://arxiv.org/abs/2508.13349

Harnessing data-driven methods for precise model independent event shape estimation in relativistic heavy-ion collisions
This study demonstrates the application of supervised machine learning (ML) techniques to distinguish between isotropic and jet-like event topologies in heavy-ion collisions via the spherocity observable. State-of-the-art ML algorithms, optimized through systematic hyperparameter tuning, are employed to predict both traditional transverse spherocity $S_{0}$ and unweighted transverse spherocity $S_{0}^{p_{\rm T}=1}$ directly from raw event data. Moreover, the results from this study demonstrated…

@arXiv_mathOC_bot@mastoxiv.page
2025-05-30 10:14:28

This https://arxiv.org/abs/2412.06481 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_mat…

DeePC-Hunt: Data-enabled Predictive Control Hyperparameter Tuning via Differentiable Optimization
This paper introduces Data-enabled Predictive Control Hyperparameter Tuning via Differentiable Optimization (DeePC-Hunt), a backpropagation-based method for automatic hyperparameter tuning of the DeePC algorithm. The necessity for such a method arises from the importance of hyperparameter selection to achieve satisfactory closed-loop DeePC performance. The standard methods for hyperparameter selection are to either optimize the open-loop performance, or use manual guess-and-check. Optimizing th…

@arXiv_csCR_bot@mastoxiv.page
2025-08-11 08:59:00

Towards Effective Offensive Security LLM Agents: Hyperparameter Tuning, LLM as a Judge, and a Lightweight CTF Benchmark
Minghao Shao, Nanda Rani, Kimberly Milner, Haoran Xi, Meet Udeshi, Saksham Aggarwal, Venkata Sai Charan Putrevu, Sandeep Kumar Shukla, Prashanth Krishnamurthy, Farshad Khorrami, Ramesh Karri, Muhammad Shafique
https://arx…

Towards Effective Offensive Security LLM Agents: Hyperparameter Tuning, LLM as a Judge, and a Lightweight CTF Benchmark
Recent advances in LLM agentic systems have improved the automation of offensive security tasks, particularly for Capture the Flag (CTF) challenges. We systematically investigate the key factors that drive agent success and provide a detailed recipe for building effective LLM-based offensive security agents. First, we present CTFJudge, a framework leveraging LLM as a judge to analyze agent trajectories and provide granular evaluation across CTF solving steps. Second, we propose a novel metric, …

@arXiv_csIR_bot@mastoxiv.page
2025-08-15 09:30:13

CrossDenoise: Denoising Implicit Feedback via a Lightweight Entity-Aware Synergistic Framework
Ze Liu, Xianquan Wang, Shuochen Liu, Jie Ma, Huibo Xu, Yupeng Han, Zhe Yang, Kai Zhang, Longfei Li, Jun Zhou
https://arxiv.org/abs/2508.10851

CrossDenoise: Denoising Implicit Feedback via a Lightweight Entity-Aware Synergistic Framework
Recommender systems heavily rely on implicit feedback, which is inherently noisy due to false positives and negatives, severely degrading recommendation accuracy. Existing denoising strategies often overlook entity-aware modeling, suffer from high computational overhead, or demand excessive hyperparameter tuning, limiting their real-world applicability. We propose CrossDenoise, a novel and lightweight framework that addresses these challenges by disentangling noise estimation into user-, item-,…

@arXiv_csLG_bot@mastoxiv.page
2025-06-09 10:11:42

carps: A Framework for Comparing N Hyperparameter Optimizers on M Benchmarks
Carolin Benjamins, Helena Graf, Sarah Segel, Difan Deng, Tim Ruhkopf, Leona Hennig, Soham Basu, Neeratyoy Mallik, Edward Bergman, Deyao Chen, Fran\c{c}ois Cl\'ement, Matthias Feurer, Katharina Eggensperger, Frank Hutter, Carola Doerr, Marius Lindauer
https://

carps: A Framework for Comparing N Hyperparameter Optimizers on M Benchmarks
Hyperparameter Optimization (HPO) is crucial to develop well-performing machine learning models. In order to ease prototyping and benchmarking of HPO methods, we propose carps, a benchmark framework for Comprehensive Automated Research Performance Studies allowing to evaluate N optimizers on M benchmark tasks. In this first release of carps, we focus on the four most important types of HPO task types: blackbox, multi-fidelity, multi-objective and multi-fidelity-multi-objective. With 3 336 tasks…

@arXiv_mathOC_bot@mastoxiv.page
2025-05-30 10:14:28

This https://arxiv.org/abs/2412.06481 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_mat…

DeePC-Hunt: Data-enabled Predictive Control Hyperparameter Tuning via Differentiable Optimization
This paper introduces Data-enabled Predictive Control Hyperparameter Tuning via Differentiable Optimization (DeePC-Hunt), a backpropagation-based method for automatic hyperparameter tuning of the DeePC algorithm. The necessity for such a method arises from the importance of hyperparameter selection to achieve satisfactory closed-loop DeePC performance. The standard methods for hyperparameter selection are to either optimize the open-loop performance, or use manual guess-and-check. Optimizing th…

@arXiv_statML_bot@mastoxiv.page
2025-08-18 08:29:50

Uniform convergence for Gaussian kernel ridge regression
Paul Dommel, Rajmadan Lakshmanan
https://arxiv.org/abs/2508.11274 https://arxiv.org/pdf/2508.11274…

Uniform convergence for Gaussian kernel ridge regression
This paper establishes the first polynomial convergence rates for Gaussian kernel ridge regression (KRR) with a fixed hyperparameter in both the uniform and the $L^{2}$-norm. The uniform convergence result closes a gap in the theoretical understanding of KRR with the Gaussian kernel, where no such rates were previously known. In addition, we prove a polynomial $L^{2}$-convergence rate in the case, where the Gaussian kernel's width parameter is fixed. This also contributes to the broader underst…

@arXiv_statME_bot@mastoxiv.page
2025-07-08 12:19:30

Predictive posteriors under hidden confounding
Carlos Garc\'ia Meixide, David R\'ios Insua
https://arxiv.org/abs/2507.05170 https://

Predictive posteriors under hidden confounding
Predicting outcomes in external domains is challenging due to hidden confounders that influence both predictors and outcomes, complicating generalization under distribution shifts. Traditional methods often rely on stringent assumptions or overly conservative regularization, compromising estimation and predictive accuracy. Generative Invariance (GI) is a novel framework that facilitates predictions in unseen domains without requiring hyperparameter tuning or knowledge of specific distribution s…

@arXiv_csNE_bot@mastoxiv.page
2025-08-18 08:29:00

SO-PIFRNN: Self-optimization physics-informed Fourier-features randomized neural network for solving partial differential equations
Jiale Linghu, Weifeng Gao, Hao Dong, Yufeng Nie
https://arxiv.org/abs/2508.10921

SO-PIFRNN: Self-optimization physics-informed Fourier-features randomized neural network for solving partial differential equations
This study proposes a self-optimization physics-informed Fourier-features randomized neural network (SO-PIFRNN) framework, which significantly improves the numerical solving accuracy of PDEs through hyperparameter optimization mechanism. The framework employs a bi-level optimization architecture: the outer-level optimization utilizes a multi-strategy collaborated particle swarm optimization (MSC-PSO) algorithm to search for optimal hyperparameters of physics-informed Fourier-features randomized…

@arXiv_csLG_bot@mastoxiv.page
2025-08-21 10:16:30

Successive Halving with Learning Curve Prediction via Latent Kronecker Gaussian Processes
Jihao Andreas Lin, Nicolas Mayoraz, Steffen Rendle, Dima Kuzmin, Emil Praun, Berivan Isik
https://arxiv.org/abs/2508.14818

Successive Halving with Learning Curve Prediction via Latent Kronecker Gaussian Processes
Successive Halving is a popular algorithm for hyperparameter optimization which allocates exponentially more resources to promising candidates. However, the algorithm typically relies on intermediate performance values to make resource allocation decisions, which can cause it to prematurely prune slow starters that would eventually become the best candidate. We investigate whether guiding Successive Halving with learning curve predictions based on Latent Kronecker Gaussian Processes can overcom…

@arXiv_eessSP_bot@mastoxiv.page
2025-08-05 11:03:00

The Role of Review Process Failures in Affective State Estimation: An Empirical Investigation of DEAP Dataset
Nazmun N Khan, Taylor Sweet, Chase A Harvey, Calder Knapp, Dean J. Krusienski, David E Thompson
https://arxiv.org/abs/2508.02417

The Role of Review Process Failures in Affective State Estimation: An Empirical Investigation of DEAP Dataset
The reliability of affective state estimation using EEG data is in question, given the variability in reported performance and the lack of standardized evaluation protocols. To investigate this, we reviewed 101 studies, focusing on the widely used DEAP dataset for emotion recognition. Our analysis revealed widespread methodological issues that include data leakage from improper segmentation, biased feature selection, flawed hyperparameter optimization, neglect of class imbalance, and insufficie…

@arXiv_csSD_bot@mastoxiv.page
2025-07-04 08:47:11

Posterior Transition Modeling for Unsupervised Diffusion-Based Speech Enhancement
Mostafa Sadeghi (MULTISPEECH), Jean-Eudes Ayilo (MULTISPEECH), Romain Serizel (MULTISPEECH), Xavier Alameda-Pineda (ROBOTLEARN)
https://arxiv.org/abs/2507.02391

Posterior Transition Modeling for Unsupervised Diffusion-Based Speech Enhancement
We explore unsupervised speech enhancement using diffusion models as expressive generative priors for clean speech. Existing approaches guide the reverse diffusion process using noisy speech through an approximate, noise-perturbed likelihood score, combined with the unconditional score via a trade-off hyperparameter. In this work, we propose two alternative algorithms that directly model the conditional reverse transition distribution of diffusion states. The first method integrates the diffusi…

@arXiv_statML_bot@mastoxiv.page
2025-08-18 08:39:50

ADMIRE-BayesOpt: Accelerated Data MIxture RE-weighting for Language Models with Bayesian Optimization
Shengzhuang Chen, Xu Ouyang, Michael Arthur Leopold Pearce, Thomas Hartvigsen, Jonathan Richard Schwarz
https://arxiv.org/abs/2508.11551

ADMIRE-BayesOpt: Accelerated Data MIxture RE-weighting for Language Models with Bayesian Optimization
Determining the optimal data mixture for large language model training remains a challenging problem with an outsized impact on performance. In practice, language model developers continue to rely on heuristic exploration since no learning-based approach has emerged as a reliable solution. In this work, we propose to view the selection of training data mixtures as a black-box hyperparameter optimization problem, for which Bayesian Optimization is a well-established class of appropriate algorith…

@arXiv_eessSY_bot@mastoxiv.page
2025-07-04 08:28:21

Enhancing Power Flow Estimation with Topology-Aware Gated Graph Neural Networks
Shrenik Jadhav, Birva Sevak, Srijita Das, Wencong Su, Van-Hai Bui
https://arxiv.org/abs/2507.02078 …

Enhancing Power Flow Estimation with Topology-Aware Gated Graph Neural Networks
Accurate and scalable surrogate models for AC power flow are essential for real-time grid monitoring, contingency analysis, and decision support in increasingly dynamic and inverter-dominated power systems. However, most existing surrogates fall short of practical deployment due to their limited capacity to capture long-range nonlinear dependencies in meshed transmission networks and their weak enforcement of physical laws. These models often require extensive hyperparameter tuning, exhibit poo…

@arXiv_condmatdisnn_bot@mastoxiv.page
2025-07-11 09:25:31

A statistical physics framework for optimal learning
Francesca Mignacco, Francesco Mori
https://arxiv.org/abs/2507.07907 https://arxi…

A statistical physics framework for optimal learning
Learning is a complex dynamical process shaped by a range of interconnected decisions. Careful design of hyperparameter schedules for artificial neural networks or efficient allocation of cognitive resources by biological learners can dramatically affect performance. Yet, theoretical understanding of optimal learning strategies remains sparse, especially due to the intricate interplay between evolving meta-parameters and nonlinear learning dynamics. The search for optimal protocols is further h…

@arXiv_physicsdataan_bot@mastoxiv.page
2025-08-12 09:19:33

Error Breakdown and Sensitivity Analysis of Dynamical Quantities in Markov State Models
Yehor Tuchkov, Luke Evans, Sonya M. Hanson, Erik H. Thiede
https://arxiv.org/abs/2508.06735

Error Breakdown and Sensitivity Analysis of Dynamical Quantities in Markov State Models
Markov state models (MSMs) are widely employed to analyze the kinetics of complex systems. But despite their effectiveness in many applications, MSMs are prone to systematic or statistical errors, often exacerbated by suboptimal hyperparameter choice. In this paper, we attempt to understand how these choices affect the error of estimates of mean first-passage times and committors, key quantities in chemical rate theory. We first evaluate the performance of the recently introduced "stopped-proce…

@arXiv_csCE_bot@mastoxiv.page
2025-07-30 07:33:51

Improving Neural Network Training using Dynamic Learning Rate Schedule for PINNs and Image Classification
D. Veerababu, Ashwin A. Raikar, Prasanta K. Ghosh
https://arxiv.org/abs/2507.21749

Improving Neural Network Training using Dynamic Learning Rate Schedule for PINNs and Image Classification
Training neural networks can be challenging, especially as the complexity of the problem increases. Despite using wider or deeper networks, training them can be a tedious process, especially if a wrong choice of the hyperparameter is made. The learning rate is one of such crucial hyperparameters, which is usually kept static during the training process. Learning dynamics in complex systems often requires a more adaptive approach to the learning rate. This adaptability becomes crucial to effecti…

@arXiv_csLG_bot@mastoxiv.page
2025-08-20 10:17:40

AutoScale: Linear Scalarization Guided by Multi-Task Optimization Metrics
Yi Yang, Kei Ikemura, Qingwen Zhang, Xiaomeng Zhu, Ci Li, Nazre Batool, Sina Sharif Mansouri, John Folkesson
https://arxiv.org/abs/2508.13979

AutoScale: Linear Scalarization Guided by Multi-Task Optimization Metrics
Recent multi-task learning studies suggest that linear scalarization, when using well-chosen fixed task weights, can achieve comparable to or even better performance than complex multi-task optimization (MTO) methods. It remains unclear why certain weights yield optimal performance and how to determine these weights without relying on exhaustive hyperparameter search. This paper establishes a direct connection between linear scalarization and MTO methods, revealing through extensive experiments…

@arXiv_csIR_bot@mastoxiv.page
2025-07-10 09:20:11

SPEAR: Subset-sampled Performance Evaluation via Automated Ground Truth Generation for RAG
Zou Yuheng, Wang Yiran, Tian Yuzhu, Zhu Min, Huang Yanhua
https://arxiv.org/abs/2507.06554

SPEAR: Subset-sampled Performance Evaluation via Automated Ground Truth Generation for RAG
Retrieval-Augmented Generation (RAG) is a core approach for enhancing Large Language Models (LLMs), where the effectiveness of the retriever largely determines the overall response quality of RAG systems. Retrievers encompass a multitude of hyperparameters that significantly impact performance outcomes and demonstrate sensitivity to specific applications. Nevertheless, hyperparameter optimization entails prohibitively high computational expenses. Existing evaluation methods suffer from either p…

@arXiv_mathOC_bot@mastoxiv.page
2025-06-19 09:08:07

On the Effectiveness of Classical Regression Methods for Optimal Switching Problems
Martin Andersson, Benny Avelin, Marcus Olofsson
https://arxiv.org/abs/2506.15436

On the Effectiveness of Classical Regression Methods for Optimal Switching Problems
Simple regression methods provide robust, near-optimal solutions for optimal switching problems in dimensions ranging from 1 to 50. While the theory requires solving intractable PDE systems, the Longstaff-Schwartz algorithm with classical approaches like $k$-NN achieves excellent switching decisions without extensive hyperparameter tuning. Testing eight regression approaches on four benchmark problems, we find that simple methods maintain stable performance across diverse problem characteristic…

@arXiv_nuclth_bot@mastoxiv.page
2025-08-06 08:45:40

Machine Learning-Driven High-Precision Model for $\alpha$-Decay Energy and Half-Life Prediction of superheavy nuclei
Qingning Yuan, Panpan Qi, Xuanpen Xiao, Xue Wang, Juan He, Guimei Long, Zhengwei Duan, Yangyan Dai, Runchao Yan, Gongming Yu, Haitao Yang, Qiang Hu
https://arxiv.org/abs/2508.03155

Machine Learning-Driven High-Precision Model for $α$-Decay Energy and Half-Life Prediction of superheavy nuclei
Based on Extreme Gradient Boosting (XGBoost) framework optimized via Bayesian hyperparameter tuning, we investigated the α-decay energy and half-life of superheavy nuclei. By incorporating key nuclear structural features-including mass number, proton-to-neutron ratio, magic number proximity, and angular momentum transfer-the optimized model captures essential physical mechanisms governing $α$-decay. On the test set, the model achieves significantly lower mean absolute error (MAE) and root mea…

@arXiv_csGR_bot@mastoxiv.page
2025-06-10 07:38:52

Vid2Sim: Generalizable, Video-based Reconstruction of Appearance, Geometry and Physics for Mesh-free Simulation
Chuhao Chen, Zhiyang Dou, Chen Wang, Yiming Huang, Anjun Chen, Qiao Feng, Jiatao Gu, Lingjie Liu
https://arxiv.org/abs/2506.06440

Vid2Sim: Generalizable, Video-based Reconstruction of Appearance, Geometry and Physics for Mesh-free Simulation
Faithfully reconstructing textured shapes and physical properties from videos presents an intriguing yet challenging problem. Significant efforts have been dedicated to advancing such a system identification problem in this area. Previous methods often rely on heavy optimization pipelines with a differentiable simulator and renderer to estimate physical parameters. However, these approaches frequently necessitate extensive hyperparameter tuning for each scene and involve a costly optimization p…

@arXiv_eessIV_bot@mastoxiv.page
2025-07-31 08:17:11

trAIce3D: A Prompt-Driven Transformer Based U-Net for Semantic Segmentation of Microglial Cells from Large-Scale 3D Microscopy Images
MohammadAmin Alamalhoda, Arsalan Firoozi, Alessandro Venturino, Sandra Siegert
https://arxiv.org/abs/2507.22635

trAIce3D: A Prompt-Driven Transformer Based U-Net for Semantic Segmentation of Microglial Cells from Large-Scale 3D Microscopy Images
The shape of a cell contains essential information about its function within the biological system. Segmenting these structures from large-scale 3D microscopy images is challenging, limiting clinical insights especially for microglia, immune-associated cells involved in neurodegenerative diseases. Existing segmentation methods mainly focus on cell bodies, struggle with overlapping structures, perform poorly on noisy images, require hyperparameter tuning for each new dataset, or rely on tedious …

@arXiv_csNE_bot@mastoxiv.page
2025-06-11 07:44:43

A Practical Guide to Tuning Spiking Neuronal Dynamics
William Gebhardt, Alexander G. Ororbia, Nathan McDonald, Clare Thiem, Jack Lombardi
https://arxiv.org/abs/2506.08138

A Practical Guide to Tuning Spiking Neuronal Dynamics
In this work, we examine fundamental elements of spiking neural networks (SNNs) as well as how to tune them. Concretely, we focus on two different foundational neuronal units utilized in SNNs -- the leaky integrate-and-fire (LIF) and the resonate-and-fire (RAF) neuron. We explore key equations and how hyperparameter values affect behavior. Beyond hyperparameters, we discuss other important design elements of SNNs -- the choice of input encoding and the setup for excitatory-inhibitory population…

@arXiv_nlincd_bot@mastoxiv.page
2025-07-09 08:31:52

Minimal Deterministic Echo State Networks Outperform Random Reservoirs in Learning Chaotic Dynamics
Francesco Martinuzzi
https://arxiv.org/abs/2507.06050 h…

Minimal Deterministic Echo State Networks Outperform Random Reservoirs in Learning Chaotic Dynamics
Machine learning (ML) is widely used to model chaotic systems. Among ML approaches, echo state networks (ESNs) have received considerable attention due to their simple construction and fast training. However, ESN performance is highly sensitive to hyperparameter choices and to its random initialization. In this work, we demonstrate that ESNs constructed using deterministic rules and simple topologies (MESNs) outperform standard ESNs in the task of chaotic attractor reconstruction. We use a data…

@arXiv_csLG_bot@mastoxiv.page
2025-07-09 10:20:22

Improving Robustness of Foundation Models in Domain Adaptation with Soup-Adapters
Marco Roschkowski
https://arxiv.org/abs/2507.05807 https://

Improving Robustness of Foundation Models in Domain Adaptation with Soup-Adapters
In this paper, we tackle two fundamental problems in few-shot domain adaptation of foundation models. First, hyperparameter tuning is often impractical due to the lack of large validation datasets. Second, model robustness under distribution shifts where test time data deviates slightly from training distributions, remains a concern. We show that by training multiple independent adapters and averaging their outputs, the new model has a higher performance and is more robust to distribution shift…

@arXiv_statML_bot@mastoxiv.page
2025-05-30 10:16:22

This https://arxiv.org/abs/2502.06044 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_sta…

Scalable Differentially Private Bayesian Optimization
In recent years, there has been much work on scaling Bayesian Optimization to high-dimensional problems, for example hyperparameter tuning in large machine learning models. These scalable methods have been successful, finding high objective values much more quickly than traditional global Bayesian Optimization or random search-based methods. At the same time, these large models often use sensitive data, but preservation of Differential Privacy has not scaled alongside these modern Bayesian Opti…

@arXiv_statAP_bot@mastoxiv.page
2025-08-01 08:12:41

Efficient inference of dynamic gene regulatory networks using discrete penalty
Visweswaran Ravikumar, Aaresh Bhathena, Wajd N Al-Holou, Salar Fattahi, Arvind Rao
https://arxiv.org/abs/2507.23106

Efficient inference of dynamic gene regulatory networks using discrete penalty
Gene regulatory networks (GRNs) orchestrate cellular decision making and survival strategies. Inferring the structure of these networks from high-dimensional transcriptomics data is a central challenge in systems biology. Traditional approaches to GRN inference, such as the graphical lasso and its joint extensions, rely on $\ell_1$ penalty to induce sparsity but can bias network recovery and require extensive hyperparameter tuning. Here, we present a scalable framework for the joint inference o…

@arXiv_csLG_bot@mastoxiv.page
2025-06-10 19:23:11

This https://arxiv.org/abs/2506.05673 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csLG_…

Peer-Ranked Precision: Creating a Foundational Dataset for Fine-Tuning Vision Models from DataSeeds' Annotated Imagery
The development of modern Artificial Intelligence (AI) models, particularly diffusion-based models employed in computer vision and image generation tasks, is undergoing a paradigmatic shift in development methodologies. Traditionally dominated by a "Model Centric" approach, in which performance gains were primarily pursued through increasingly complex model architectures and hyperparameter optimization, the field is now recognizing a more nuanced "Data-Centric" approach. This emergent framework…

@arXiv_mathOC_bot@mastoxiv.page
2025-06-02 07:27:33

Fine-tuning for Data-enabled Predictive Control of Noisy Systems by Reinforcement Learning
Jinbao Wang, Shiliang Zhang, Jun Liu, Xuehui Ma, Haolin Liu
https://arxiv.org/abs/2505.24572

Fine-tuning for Data-enabled Predictive Control of Noisy Systems by Reinforcement Learning
Data-enabled predictive control (DeePC) leverages system measurements in characterizing system dynamics for optimal control. The performance of DeePC relies on optimizing its hyperparameters, especially in noisy systems where the optimal hyperparameters adapt over time. Existing hyperparameter tuning approaches for DeePC are more than often computationally inefficient or overly conservative. This paper proposes an adaptive DeePC where we guide its hyperparameters adaption through reinforcement …

@arXiv_csLG_bot@mastoxiv.page
2025-06-05 10:52:58

This https://arxiv.org/abs/2503.22733 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csLG_…

RBFleX-NAS: Training-Free Neural Architecture Search Using Radial Basis Function Kernel and Hyperparameter Detection
Neural Architecture Search (NAS) is an automated technique to design optimal neural network architectures for a specific workload. Conventionally, evaluating candidate networks in NAS involves extensive training, which requires significant time and computational resources. To address this, training-free NAS has been proposed to expedite network evaluation with minimal search time. However, state-of-the-art training-free NAS algorithms struggle to precisely distinguish well-performing networks f…

@arXiv_mathOC_bot@mastoxiv.page
2025-06-30 12:54:50

Replaced article(s) found for math.OC. https://arxiv.org/list/math.OC/new
[1/1]:
- Global relaxation-based LP-Newton method for multiple hyperparameter selection in support vector ...
Yaru Qian, Qingna Li, Alain Zemkoho

@arXiv_csLG_bot@mastoxiv.page
2025-06-03 21:31:55

This https://arxiv.org/abs/2505.00812 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csLG_…

Handling Label Noise via Instance-Level Difficulty Modeling and Dynamic Optimization
Recent studies indicate that deep neural networks degrade in generalization performance under noisy supervision. Existing methods focus on isolating clean subsets or correcting noisy labels, facing limitations such as high computational costs, heavy hyperparameter tuning process, and coarse-grained optimization. To address these challenges, we propose a novel two-stage noisy learning framework that enables instance-level optimization through a dynamically weighted loss function, avoiding hyperp…

@arXiv_csLG_bot@mastoxiv.page
2025-07-14 07:56:42

Tree-Structured Parzen Estimator Can Solve Black-Box Combinatorial Optimization More Efficiently
Kenshin Abe, Yunzhuo Wang, Shuhei Watanabe
https://arxiv.org/abs/2507.08053 https://arxiv.org/pdf/2507.08053 https://arxiv.org/html/2507.08053
arXiv:2507.08053v1 Announce Type: new
Abstract: Tree-structured Parzen estimator (TPE) is a versatile hyperparameter optimization (HPO) method supported by popular HPO tools. Since these HPO tools have been developed in line with the trend of deep learning (DL), the problem setups often used in the DL domain have been discussed for TPE such as multi-objective optimization and multi-fidelity optimization. However, the practical applications of HPO are not limited to DL, and black-box combinatorial optimization is actively utilized in some domains, e.g., chemistry and biology. As combinatorial optimization has been an untouched, yet very important, topic in TPE, we propose an efficient combinatorial optimization algorithm for TPE. In this paper, we first generalize the categorical kernel with the numerical kernel in TPE, enabling us to introduce a distance structure to the categorical kernel. Then we discuss modifications for the newly developed kernel to handle a large combinatorial search space. These modifications reduce the time complexity of the kernel calculation with respect to the size of a combinatorial search space. In the experiments using synthetic problems, we verified that our proposed method identifies better solutions with fewer evaluations than the original TPE. Our algorithm is available in Optuna, an open-source framework for HPO.
toXiv_bot_toot

Tootfinder

Opt-in global Mastodon full text search. Join the index!