Tootfinder

@arXiv_csLG_bot@mastoxiv.page
2025-10-10 11:16:19

DYNAMIX: RL-based Adaptive Batch Size Optimization in Distributed Machine Learning Systems
Yuanjun Dai, Keqiang He, An Wang
https://arxiv.org/abs/2510.08522 https://

DYNAMIX: RL-based Adaptive Batch Size Optimization in Distributed Machine Learning Systems
Existing batch size selection approaches in dis- tributed machine learning rely on static allocation or simplistic heuristics that fail to adapt to heterogeneous, dynamic computing environments. We present DYNAMIX, a reinforcement learning framework that formulates batch size optimization as a sequen- tial decision-making problem using Proximal Policy Optimiza- tion (PPO). Our approach employs a multi-dimensional state representation encompassing network-level metrics, system-level resource uti…

@arXiv_csCR_bot@mastoxiv.page
2025-09-10 08:13:31

Sequentially Auditing Differential Privacy
Tom\'as Gonz\'alez, Mateo Dulce-Rubio, Aaditya Ramdas, M\'onica Ribero
https://arxiv.org/abs/2509.07055 https://

Sequentially Auditing Differential Privacy
We propose a practical sequential test for auditing differential privacy guarantees of black-box mechanisms. The test processes streams of mechanisms' outputs providing anytime-valid inference while controlling Type I error, overcoming the fixed sample size limitation of previous batch auditing methods. Experiments show this test detects violations with sample sizes that are orders of magnitude smaller than existing methods, reducing this number from 50K to a few hundred examples, across divers…

@arXiv_csDC_bot@mastoxiv.page
2025-10-10 08:03:19

Adaptive Execution Scheduler for DataDios SmartDiff
Aryan Poduri
https://arxiv.org/abs/2510.07811 https://arxiv.org/pdf/2510.07811

Adaptive Execution Scheduler for DataDios SmartDiff
We present an adaptive scheduler for a single differencing engine (SmartDiff) with two execution modes: (i) in-memory threads and (ii) Dask based parallelism. The scheduler continuously tunes batch size and worker/thread count within fixed CPU and memory budgets to minimize p95 latency. A lightweight preflight profiler estimates bytes/row and I/O rate; an online cost/memory model prunes unsafe actions; and a guarded hill-climb policy favors lower latency with backpressure and straggler mitigati…

@arXiv_csRO_bot@mastoxiv.page
2025-08-29 09:47:41

Deep Fuzzy Optimization for Batch-Size and Nearest Neighbors in Optimal Robot Motion Planning
Liding Zhang, Qiyang Zong, Yu Zhang, Zhenshan Bing, Alois Knoll
https://arxiv.org/abs/2508.20884

Deep Fuzzy Optimization for Batch-Size and Nearest Neighbors in Optimal Robot Motion Planning
Efficient motion planning algorithms are essential in robotics. Optimizing essential parameters, such as batch size and nearest neighbor selection in sampling-based methods, can enhance performance in the planning process. However, existing approaches often lack environmental adaptability. Inspired by the method of the deep fuzzy neural networks, this work introduces Learning-based Informed Trees (LIT*), a sampling-based deep fuzzy learning-based planner that dynamically adjusts batch size and …

@arXiv_csIT_bot@mastoxiv.page
2025-08-05 10:18:11

The Length of Functional Batch and PIR Codes
Altan B. Kilic, Alberto Ravagnani, Flavio Salizzoni
https://arxiv.org/abs/2508.02586 https://arxiv.org/pdf/250…

The Length of Functional Batch and PIR Codes
We consider the problem of computing the minimum length of functional batch and PIR codes of fixed dimension and for a fixed list size, over an arbitrary finite field. We recover, generalize, and refine several results that were previously obtained for binary codes. We present new upper and lower bounds for the minimum length, and discuss the asymptotic behaviour of this parameter. We also compute its value for several parameter sets. The paper also offers insights into the "correct" list size …

@arXiv_mathOC_bot@mastoxiv.page
2025-08-05 11:00:30

ASPEN: An Additional Sampling Penalty Method for Finite-Sum Optimization Problems with Nonlinear Equality Constraints
Nata\v{s}a Kreji\'c, Nata\v{s}a Krklec Jerinki\'c, Tijana Ostoji\'c, Nemanja Vu\v{c}i\'cevi\'c
https://arxiv.org/abs/2508.02299

ASPEN: An Additional Sampling Penalty Method for Finite-Sum Optimization Problems with Nonlinear Equality Constraints
We propose a novel algorithm for solving non-convex, nonlinear equality-constrained finite-sum optimization problems. The proposed algorithm incorporates an additional sampling strategy for sample size update into the well-known framework of quadratic penalty methods. Thus, depending on the problem at hand, the resulting method may exhibit a sample size strategy ranging from a mini-batch on one end, to increasing sample size that achieves the full sample eventually, on the other end of the spec…

@arXiv_csLG_bot@mastoxiv.page
2025-09-22 10:33:31

DIVEBATCH: Accelerating Model Training Through Gradient-Diversity Aware Batch Size Adaptation
Yuen Chen, Yian Wang, Hari Sundaram
https://arxiv.org/abs/2509.16173 https://

DIVEBATCH: Accelerating Model Training Through Gradient-Diversity Aware Batch Size Adaptation
The goal of this paper is to accelerate the training of machine learning models, a critical challenge since the training of large-scale deep neural models can be computationally expensive. Stochastic gradient descent (SGD) and its variants are widely used to train deep neural networks. In contrast to traditional approaches that focus on tuning the learning rate, we propose a novel adaptive batch size SGD algorithm, DiveBatch, that dynamically adjusts the batch size. Adapting the batch size is c…

@arXiv_csDC_bot@mastoxiv.page
2025-10-01 08:25:37

Efficient Distributed Training via Dual Batch Sizes and Cyclic Progressive Learning
Kuan-Wei Lu, Ding-Yong Hong, Pangfeng Liu, Jan-Jan Wu
https://arxiv.org/abs/2509.26092 https:…

Efficient Distributed Training via Dual Batch Sizes and Cyclic Progressive Learning
Distributed machine learning is critical for training deep learning models on large datasets and with numerous parameters. Current research primarily focuses on leveraging additional hardware resources and powerful computing units to accelerate the training process. As a result, larger batch sizes are often employed to speed up training. However, training with large batch sizes can lead to lower accuracy due to poor generalization. To address this issue, we propose the dual batch size learning …

@arXiv_statML_bot@mastoxiv.page
2025-10-02 09:20:40

Theory of Scaling Laws for In-Context Regression: Depth, Width, Context and Time
Blake Bordelon, Mary I. Letey, Cengiz Pehlevan
https://arxiv.org/abs/2510.01098 https://

Theory of Scaling Laws for In-Context Regression: Depth, Width, Context and Time
We study in-context learning (ICL) of linear regression in a deep linear self-attention model, characterizing how performance depends on various computational and statistical resources (width, depth, number of training steps, batch size and data per context). In a joint limit where data dimension, context length, and residual stream width scale proportionally, we analyze the limiting asymptotics for three ICL settings: (1) isotropic covariates and tasks (ISO), (2) fixed and structured covarianc…

@arXiv_csDB_bot@mastoxiv.page
2025-09-03 08:31:03

Access Paths for Efficient Ordering with Large Language Models
Fuheng Zhao, Jiayue Chen, Yiming Pan, Tahseen Rabbani, Divyakant Agrawal, Amr El Abbadi
https://arxiv.org/abs/2509.00303

Access Paths for Efficient Ordering with Large Language Models
We present the LLM ORDER BY operator as a logical abstraction and study its physical implementations within a unified evaluation framework. Our experiments show that no single approach is universally optimal, with effectiveness depending on query characteristics and data. We introduce three new designs: an agreement-based batch-size policy, a majority voting mechanism for pairwise sorting, and a two-way external merge sort adapted for LLMs. With extensive experiments, our agreement-based proced…

@arXiv_csLG_bot@mastoxiv.page
2025-10-08 10:38:29

NeST-BO: Fast Local Bayesian Optimization via Newton-Step Targeting of Gradient and Hessian Information
Wei-Ting Tang, Akshay Kudva, Joel A. Paulson
https://arxiv.org/abs/2510.05516

NeST-BO: Fast Local Bayesian Optimization via Newton-Step Targeting of Gradient and Hessian Information
Bayesian optimization (BO) is effective for expensive black-box problems but remains challenging in high dimensions. We propose NeST-BO, a local BO method that targets the Newton step by jointly learning gradient and Hessian information with Gaussian process surrogates, and selecting evaluations via a one-step lookahead bound on Newton-step error. We show that this bound (and hence the step error) contracts with batch size, so NeST-BO directly inherits inexact-Newton convergence: global progres…

@arXiv_physicscompph_bot@mastoxiv.page
2025-09-05 08:23:01

A Highly Scalable TDMA for GPUs and Its Application to Flow Solver Optimization
Seungchan Kim, Jihoo Kim, Sanghyun Ha, Donghyun You
https://arxiv.org/abs/2509.03933 https://

A Highly Scalable TDMA for GPUs and Its Application to Flow Solver Optimization
A tridiagonal matrix algorithm (TDMA), Pipelined-TDMA, is developed for multi-GPU systems to resolve the scalability bottlenecks caused by the sequential structure of conventional divide-and-conquer TDMA. The proposed method pipelines multiple tridiagonal systems, overlapping communication with computation and executing GPU kernels concurrently to hide non-scalable stages behind scalable compute stages. To maximize performance, the batch size is optimized to strike a balance between GPU occupan…

@arXiv_statML_bot@mastoxiv.page
2025-07-24 09:01:50

Nearly Minimax Discrete Distribution Estimation in Kullback-Leibler Divergence with High Probability
Dirk van der Hoeven, Julia Olkhovskaia, Tim van Erven
https://arxiv.org/abs/2507.17316

Nearly Minimax Discrete Distribution Estimation in Kullback-Leibler Divergence with High Probability
We consider the problem of estimating a discrete distribution $p$ with support of size $K$ and provide both upper and lower bounds with high probability in KL divergence. We prove that in the worst case, for any estimator $\widehat{p}$, with probability at least $δ$, $\text{KL}(p \| \widehat{p}) \geq C\max\{K,\ln(K)\ln(1/δ) \}/n $, where $n$ is the sample size and $C > 0$ is a constant. We introduce a computationally efficient estimator $p^{\text{OTB}}$, based on Online to Batch conversi…

@arXiv_csCR_bot@mastoxiv.page
2025-08-22 08:59:11

Tighter Privacy Analysis for Truncated Poisson Sampling
Arun Ganesh
https://arxiv.org/abs/2508.15089 https://arxiv.org/pdf/2508.15089

Tighter Privacy Analysis for Truncated Poisson Sampling
We give a new privacy amplification analysis for truncated Poisson sampling, a Poisson sampling variant that truncates a batch if it exceeds a given maximum batch size.

@arXiv_csLG_bot@mastoxiv.page
2025-08-20 10:15:20

Fisher-Orthogonal Projection Methods for Natural Gradient Descent with Large Batches
Yishun Lu, Wesley Armour
https://arxiv.org/abs/2508.13898 https://arxi…

Fisher-Orthogonal Projection Methods for Natural Gradient Descent with Large Batches
Modern GPUs are equipped with large amounts of high-bandwidth memory, enabling them to support mini-batch sizes of up to tens of thousands of training samples. However, most existing optimizers struggle to perform effectively at such a large batch size. As batch size increases, gradient noise decreases due to averaging over many samples, limiting the ability of first-order methods to escape sharp or suboptimal minima and reach the global minimum. Meanwhile, second-order methods like the natural…

@arXiv_csIR_bot@mastoxiv.page
2025-09-15 07:38:41

Faster and Memory-Efficient Training of Sequential Recommendation Models for Large Catalogs
Maxim Zhelnin, Dmitry Redko, Volkov Daniil, Anna Volodkevich, Petr Sokerin, Valeriy Shevchenko, Egor Shvetsov, Alexey Vasilev, Darya Denisova, Ruslan Izmailov, Alexey Zaytsev
https://arxiv.org/abs/2509.09682…

Faster and Memory-Efficient Training of Sequential Recommendation Models for Large Catalogs
Sequential recommendations (SR) with transformer-based architectures are widely adopted in real-world applications, where SR models require frequent retraining to adapt to ever-changing user preferences. However, training transformer-based SR models often encounters a high computational cost associated with scoring extensive item catalogs, often exceeding thousands of items. This occurs mainly due to the use of cross-entropy loss, where peak memory scales proportionally to catalog size, batch s…

@arXiv_csDC_bot@mastoxiv.page
2025-09-30 10:44:41

SparseServe: Unlocking Parallelism for Dynamic Sparse Attention in Long-Context LLM Serving
Qihui Zhou, Peiqi Yin, Pengfei Zuo, James Cheng
https://arxiv.org/abs/2509.24626 http…

SparseServe: Unlocking Parallelism for Dynamic Sparse Attention in Long-Context LLM Serving
Serving long-context LLMs is costly because attention computation grows linearly with context length. Dynamic sparse attention algorithms (DSAs) mitigate this by attending only to the key-value (KV) cache of critical tokens. However, with DSAs, the main performance bottleneck shifts from HBM bandwidth to HBM capacity: KV caches for unselected tokens must remain in HBM for low-latency decoding, constraining parallel batch size and stalling further throughput gains. Offloading these underutilized…

@arXiv_csLG_bot@mastoxiv.page
2025-10-02 11:10:11

Prompt Curriculum Learning for Efficient LLM Post-Training
Zhaolin Gao, Joongwon Kim, Wen Sun, Thorsten Joachims, Sid Wang, Richard Yuanzhe Pang, Liang Tan
https://arxiv.org/abs/2510.01135

Prompt Curriculum Learning for Efficient LLM Post-Training
We introduce Prompt Curriculum Learning (PCL), a lightweight reinforcement learning (RL) algorithm that selects intermediate-difficulty prompts using a learned value model to post-train language models. Since post-training LLMs via RL remains sensitive to batching and prompt selection strategies, we first conduct a series of systematic experiments where we (1) determine the optimal training batch size that balances generation efficiency and gradient quality and (2) establish the importance of f…

@arXiv_csLG_bot@mastoxiv.page
2025-09-30 14:40:41

Efficient Hyperparameter Tuning via Trajectory Invariance Principle
Bingrui Li, Jiaxin Wen, Zhanpeng Zhou, Jun Zhu, Jianfei Chen
https://arxiv.org/abs/2509.25049 https://…

Efficient Hyperparameter Tuning via Trajectory Invariance Principle
As hyperparameter tuning becomes increasingly costly at scale, efficient tuning methods are essential. Yet principles for guiding hyperparameter tuning remain limited. In this work, we seek to establish such principles by considering a broad range of hyperparameters, including batch size, learning rate, and weight decay. We identify a phenomenon we call trajectory invariance, where pre-training loss curves, gradient noise, and gradient norm exhibit invariance--closely overlapping--with respect …

@arXiv_csLG_bot@mastoxiv.page
2025-08-20 10:04:40

GRAFT: Gradient-Aware Fast MaxVol Technique for Dynamic Data Sampling
Ashish Jha, Anh huy Phan, Razan Dibo, Valentin Leplat
https://arxiv.org/abs/2508.13653 https://

GRAFT: Gradient-Aware Fast MaxVol Technique for Dynamic Data Sampling
Training modern neural networks on large datasets is computationally and environmentally costly. We introduce GRAFT, a scalable in-training subset selection method that (i) extracts a low-rank feature representation for each batch, (ii) applies a Fast MaxVol sampler to select a small, diverse subset that spans the batch's dominant subspace, and (iii) dynamically adjusts the subset size using a gradient-approximation criterion. By operating in low-rank subspaces and training on carefully chosen …

@arXiv_csDC_bot@mastoxiv.page
2025-07-17 09:49:20

Toward Efficient SpMV in Sparse LLMs via Block Extraction and Compressed Storage
Junqing Lin, Jingwei Sun, Mingge Lu, Guangzhong Sun
https://arxiv.org/abs/2507.12205

Toward Efficient SpMV in Sparse LLMs via Block Extraction and Compressed Storage
Sparse Matrix-Vector Multiplication (SpMV) has become a critical performance bottleneck in the local deployment of sparse Large Language Models (LLMs), where inference predominantly operates on workloads during the decoder phase with a batch size of one. Existing SpMV kernels and sparse matrix formats, originally designed for scientific computing, fail to exploit the unique structure patterns inherent in sparse LLMs, resulting in suboptimal performance and excessive storage overhead. This paper…

@arXiv_csLG_bot@mastoxiv.page
2025-09-23 12:49:50

Unveiling m-Sharpness Through the Structure of Stochastic Gradient Noise
Haocheng Luo, Mehrtash Harandi, Dinh Phung, Trung Le
https://arxiv.org/abs/2509.18001 https://

Unveiling m-Sharpness Through the Structure of Stochastic Gradient Noise
Sharpness-aware minimization (SAM) has emerged as a highly effective technique for improving model generalization, but its underlying principles are not fully understood. We investigated the phenomenon known as m-sharpness, where the performance of SAM improves monotonically as the micro-batch size for computing perturbations decreases. Leveraging an extended Stochastic Differential Equation (SDE) framework, combined with an analysis of the structure of stochastic gradient noise (SGN), we preci…

@arXiv_csLG_bot@mastoxiv.page
2025-09-15 09:57:11

Understanding Outer Optimizers in Local SGD: Learning Rates, Momentum, and Acceleration
Ahmed Khaled, Satyen Kale, Arthur Douillard, Chi Jin, Rob Fergus, Manzil Zaheer
https://arxiv.org/abs/2509.10439 …

Understanding Outer Optimizers in Local SGD: Learning Rates, Momentum, and Acceleration
Modern machine learning often requires training with large batch size, distributed data, and massively parallel compute hardware (like mobile and other edge devices or distributed data centers). Communication becomes a major bottleneck in such settings but methods like Local Stochastic Gradient Descent (Local SGD) show great promise in reducing this additional communication overhead. Local SGD consists of three parts: a local optimization process, an aggregation mechanism, and an outer optimize…

Tootfinder

Opt-in global Mastodon full text search. Join the index!