Tootfinder

Opt-in global Mastodon full text search. Join the index!

@arXiv_csLG_bot@mastoxiv.page
2025-10-10 11:16:19

DYNAMIX: RL-based Adaptive Batch Size Optimization in Distributed Machine Learning Systems
Yuanjun Dai, Keqiang He, An Wang
arxiv.org/abs/2510.08522

@arXiv_csCR_bot@mastoxiv.page
2025-09-10 08:13:31

Sequentially Auditing Differential Privacy
Tom\'as Gonz\'alez, Mateo Dulce-Rubio, Aaditya Ramdas, M\'onica Ribero
arxiv.org/abs/2509.07055

@arXiv_csDC_bot@mastoxiv.page
2025-10-10 08:03:19

Adaptive Execution Scheduler for DataDios SmartDiff
Aryan Poduri
arxiv.org/abs/2510.07811 arxiv.org/pdf/2510.07811

@arXiv_csRO_bot@mastoxiv.page
2025-08-29 09:47:41

Deep Fuzzy Optimization for Batch-Size and Nearest Neighbors in Optimal Robot Motion Planning
Liding Zhang, Qiyang Zong, Yu Zhang, Zhenshan Bing, Alois Knoll
arxiv.org/abs/2508.20884

@arXiv_csIT_bot@mastoxiv.page
2025-08-05 10:18:11

The Length of Functional Batch and PIR Codes
Altan B. Kilic, Alberto Ravagnani, Flavio Salizzoni
arxiv.org/abs/2508.02586 arxiv.org/pdf/250…

@arXiv_mathOC_bot@mastoxiv.page
2025-08-05 11:00:30

ASPEN: An Additional Sampling Penalty Method for Finite-Sum Optimization Problems with Nonlinear Equality Constraints
Nata\v{s}a Kreji\'c, Nata\v{s}a Krklec Jerinki\'c, Tijana Ostoji\'c, Nemanja Vu\v{c}i\'cevi\'c
arxiv.org/abs/2508.02299

@arXiv_csLG_bot@mastoxiv.page
2025-09-22 10:33:31

DIVEBATCH: Accelerating Model Training Through Gradient-Diversity Aware Batch Size Adaptation
Yuen Chen, Yian Wang, Hari Sundaram
arxiv.org/abs/2509.16173

@arXiv_csDC_bot@mastoxiv.page
2025-10-01 08:25:37

Efficient Distributed Training via Dual Batch Sizes and Cyclic Progressive Learning
Kuan-Wei Lu, Ding-Yong Hong, Pangfeng Liu, Jan-Jan Wu
arxiv.org/abs/2509.26092

@pbloem@sigmoid.social
2025-07-18 09:25:22

Now out in #TMLR:
🍇 GRAPES: Learning to Sample Graphs for Scalable Graph Neural Networks 🍇
There's lots of work on sampling subgraphs for GNNs, but relatively little on making this sampling process _adaptive_. That is, learning to select the data from the graph that is relevant for your task.
We introduce an RL-based and a GFLowNet-based sampler and show that the approach perf…

A diagram of the GRAPES pipeline. It shows a subgraph being sampled in two steps and being fed to a GNN, with a blue line showing the learning signal. The caption reads Figure 1: Overview of GRAPES. First, GRAPES processes a target node (green) by computing node inclusion probabilities on its 1-hop neighbors (shown by node color shade) with a sampling GNN. Given these probabilities, GRAPES samples k nodes. Then, GRAPES repeats this process over nodes in the 2-hop neighborhood. We pass the sampl…
A results table for node classification on heterophilious graphs. Table 2: F1-scores (%) for different sampling methods trained on heterophilous graphs for a batch size of 256, and a sample size of 256 per layer. We report the mean and standard deviation over 10 runs. The best values among the sampling baselines (all except GAS) are in bold, and the second best are underlined. MC stands for multi-class and ML stands for multi-label classification. OOM indicates out of memory.
Performance of samples vs sampling size showing that GRAPES generally performs well across sample sizes, while other samplers often show more variance across sample sizes. The caption reads Figure 4: Comparative analysis of classification accuracy across different sampling sizes for sampling baseline
and GRAPES. We repeated each experiment five times: The shaded regions show the 95% confidence intervals.
A diagrammatic illustration of a graph classification task used in one of the theorems. The caption reads Figure 9: An example of a graph for Theorem 1 with eight nodes. Red edges belong to E1, features xi and labels yi are shown beside every node. For nodes v1 and v2 we show the edge e12 as an example. As shown, the label of each node is the second feature of its neighbor, where a red edge connects them. The edge homophily ratio is h=12/28 = 0.43.
@arXiv_statML_bot@mastoxiv.page
2025-10-02 09:20:40

Theory of Scaling Laws for In-Context Regression: Depth, Width, Context and Time
Blake Bordelon, Mary I. Letey, Cengiz Pehlevan
arxiv.org/abs/2510.01098

@arXiv_csDB_bot@mastoxiv.page
2025-09-03 08:31:03

Access Paths for Efficient Ordering with Large Language Models
Fuheng Zhao, Jiayue Chen, Yiming Pan, Tahseen Rabbani, Divyakant Agrawal, Amr El Abbadi
arxiv.org/abs/2509.00303

@arXiv_csLG_bot@mastoxiv.page
2025-10-08 10:38:29

NeST-BO: Fast Local Bayesian Optimization via Newton-Step Targeting of Gradient and Hessian Information
Wei-Ting Tang, Akshay Kudva, Joel A. Paulson
arxiv.org/abs/2510.05516

@arXiv_physicscompph_bot@mastoxiv.page
2025-09-05 08:23:01

A Highly Scalable TDMA for GPUs and Its Application to Flow Solver Optimization
Seungchan Kim, Jihoo Kim, Sanghyun Ha, Donghyun You
arxiv.org/abs/2509.03933

@arXiv_statML_bot@mastoxiv.page
2025-07-24 09:01:50

Nearly Minimax Discrete Distribution Estimation in Kullback-Leibler Divergence with High Probability
Dirk van der Hoeven, Julia Olkhovskaia, Tim van Erven
arxiv.org/abs/2507.17316

@arXiv_csCR_bot@mastoxiv.page
2025-08-22 08:59:11

Tighter Privacy Analysis for Truncated Poisson Sampling
Arun Ganesh
arxiv.org/abs/2508.15089 arxiv.org/pdf/2508.15089

@arXiv_csLG_bot@mastoxiv.page
2025-08-20 10:15:20

Fisher-Orthogonal Projection Methods for Natural Gradient Descent with Large Batches
Yishun Lu, Wesley Armour
arxiv.org/abs/2508.13898 arxi…

@arXiv_csIR_bot@mastoxiv.page
2025-09-15 07:38:41

Faster and Memory-Efficient Training of Sequential Recommendation Models for Large Catalogs
Maxim Zhelnin, Dmitry Redko, Volkov Daniil, Anna Volodkevich, Petr Sokerin, Valeriy Shevchenko, Egor Shvetsov, Alexey Vasilev, Darya Denisova, Ruslan Izmailov, Alexey Zaytsev
arxiv.org/abs/2509.09682

@arXiv_csDC_bot@mastoxiv.page
2025-09-30 10:44:41

SparseServe: Unlocking Parallelism for Dynamic Sparse Attention in Long-Context LLM Serving
Qihui Zhou, Peiqi Yin, Pengfei Zuo, James Cheng
arxiv.org/abs/2509.24626

@arXiv_csLG_bot@mastoxiv.page
2025-10-02 11:10:11

Prompt Curriculum Learning for Efficient LLM Post-Training
Zhaolin Gao, Joongwon Kim, Wen Sun, Thorsten Joachims, Sid Wang, Richard Yuanzhe Pang, Liang Tan
arxiv.org/abs/2510.01135

@arXiv_csLG_bot@mastoxiv.page
2025-09-30 14:40:41

Efficient Hyperparameter Tuning via Trajectory Invariance Principle
Bingrui Li, Jiaxin Wen, Zhanpeng Zhou, Jun Zhu, Jianfei Chen
arxiv.org/abs/2509.25049

@arXiv_csLG_bot@mastoxiv.page
2025-08-20 10:04:40

GRAFT: Gradient-Aware Fast MaxVol Technique for Dynamic Data Sampling
Ashish Jha, Anh huy Phan, Razan Dibo, Valentin Leplat
arxiv.org/abs/2508.13653

@arXiv_csDC_bot@mastoxiv.page
2025-07-17 09:49:20

Toward Efficient SpMV in Sparse LLMs via Block Extraction and Compressed Storage
Junqing Lin, Jingwei Sun, Mingge Lu, Guangzhong Sun
arxiv.org/abs/2507.12205

@arXiv_csLG_bot@mastoxiv.page
2025-09-23 12:49:50

Unveiling m-Sharpness Through the Structure of Stochastic Gradient Noise
Haocheng Luo, Mehrtash Harandi, Dinh Phung, Trung Le
arxiv.org/abs/2509.18001

@arXiv_csLG_bot@mastoxiv.page
2025-09-15 09:57:11

Understanding Outer Optimizers in Local SGD: Learning Rates, Momentum, and Acceleration
Ahmed Khaled, Satyen Kale, Arthur Douillard, Chi Jin, Rob Fergus, Manzil Zaheer
arxiv.org/abs/2509.10439