Tootfinder

@Techmeme@techhub.social
2025-10-10 20:26:02

SemiAnalysis launches InferenceMAX, an open-source benchmark that automatically tracks LLM inference performance across AI models and frameworks every night (Kimbo Chen/SemiAnalysis)
https://newsletter.semianalysis.com/p/inferencemax-open-source-inference

InferenceMAX™: Open Source Inference Benchmarking
NVIDIA GB200 NVL72, AMD MI355X, Throughput Token per GPU, Latency Tok/s/user, Perf per Dollar, Tokens per Provisioned Megawatt, DeepSeek R1 670B, GPTOSS 120B, Llama3 70B

@arXiv_csCV_bot@mastoxiv.page
2025-09-11 09:32:43

Boosted Training of Lightweight Early Exits for Optimizing CNN Image Classification Inference
Yehudit Aperstein, Alexander Apartsin
https://arxiv.org/abs/2509.08318 https://

Boosted Training of Lightweight Early Exits for Optimizing CNN Image Classification Inference
Real-time image classification on resource-constrained platforms demands inference methods that balance accuracy with strict latency and power budgets. Early-exit strategies address this need by attaching auxiliary classifiers to intermediate layers of convolutional neural networks (CNNs), allowing "easy" samples to terminate inference early. However, conventional training of early exits introduces a covariance shift: downstream branches are trained on full datasets, while at inference they pro…

@arXiv_csAI_bot@mastoxiv.page
2025-07-11 09:38:21

Towards conservative inference in credal networks using belief functions: the case of credal chains
Marco Sangalli, Thomas Krak, Cassio de Campos
https://arxiv.org/abs/2507.07619 …

Towards conservative inference in credal networks using belief functions: the case of credal chains
This paper explores belief inference in credal networks using Dempster-Shafer theory. By building on previous work, we propose a novel framework for propagating uncertainty through a subclass of credal networks, namely chains. The proposed approach efficiently yields conservative intervals through belief and plausibility functions, combining computational speed with robust uncertainty representation. Key contributions include formalizing belief-based inference methods and comparing belief-based…

@arXiv_csCL_bot@mastoxiv.page
2025-08-11 10:03:19

SlimInfer: Accelerating Long-Context LLM Inference via Dynamic Token Pruning
Lingkun Long, Rubing Yang, Yushi Huang, Desheng Hui, Ao Zhou, Jianlei Yang
https://arxiv.org/abs/2508.06447

SlimInfer: Accelerating Long-Context LLM Inference via Dynamic Token Pruning
Long-context inference for Large Language Models (LLMs) is heavily limited by high computational demands. While several existing methods optimize attention computation, they still process the full set of hidden states at each layer, limiting overall efficiency. In this work, we propose SlimInfer, an innovative framework that aims to accelerate inference by directly pruning less critical prompt tokens during the forward pass. Our key insight is an information diffusion phenomenon: As information…

@arXiv_csDC_bot@mastoxiv.page
2025-08-11 07:40:09

KV Cache Compression for Inference Efficiency in LLMs: A Review
Yanyu Liu (Shandong University of Science and Technology), Jingying Fu (Shandong University of Science and Technology), Sixiang Liu (Shandong University of Science and Technology), Yitian Zou (Shandong University of Science and Technology), You Fu (Shandong University of Science and Technology), Jiehan Zhou (Shandong University of Science and Technology), Shouhua Zhang (University of Oulu)

KV Cache Compression for Inference Efficiency in LLMs: A Review
Withtherapid advancement of large language models (LLMs), the context length for inference has been continuously increasing, leading to an exponential growth in the demand for Key-Value (KV) caching. This has resulted in a significant memory bottleneck, limiting the inference efficiency and scalability of the models. Therefore, optimizing the KV cache during inference is crucial for enhancing performance and efficiency. This review systematically examines current KV cache optimization technique…

@arXiv_mathNA_bot@mastoxiv.page
2025-09-11 08:12:43

Tensor-Train Operator Inference
Engin Danis, Duc Truong, Kim {\O}. Rasmussen{\S}, Boian S. Alexandrov
https://arxiv.org/abs/2509.08071 https://arxiv.org/pd…

Tensor-Train Operator Inference
In this study, we present a tensor--train framework for nonintrusive operator inference aimed at learning discrete operators and using them to predict solutions of physical governing equations. Our framework comprises three approaches: full--order tensor--train operator inference, full--order quantized tensor--train operator inference, and reduced--order tensor--train operator inference. In each case, snapshot data is represented in tensor--train format--either through compression or cross inte…

@arXiv_csAR_bot@mastoxiv.page
2025-10-10 07:36:49

SPAD: Specialized Prefill and Decode Hardware for Disaggregated LLM Inference
Hengrui Zhang, Pratyush Patel, August Ning, David Wentzlaff
https://arxiv.org/abs/2510.08544 https:…

SPAD: Specialized Prefill and Decode Hardware for Disaggregated LLM Inference
Large Language Models (LLMs) have gained popularity in recent years, driving up the demand for inference. LLM inference is composed of two phases with distinct characteristics: a compute-bound prefill phase followed by a memory-bound decode phase. To efficiently serve LLMs, prior work proposes prefill-decode disaggregation to run each phase on separate hardware. However, existing hardware poorly matches the different requirements of each phase. Current datacenter GPUs and TPUs follow a more-is-…

@arXiv_csCR_bot@mastoxiv.page
2025-08-11 09:36:19

DMFI: Dual-Modality Fine-Tuning and Inference Framework for LLM-Based Insider Threat Detection
Kaichuan Kong, Dongjie Liu, Xiaobo Jin, Guanggang Geng, Zhiying Li, Jian Weng
https://arxiv.org/abs/2508.05694

DMFI: Dual-Modality Fine-Tuning and Inference Framework for LLM-Based Insider Threat Detection
Insider threat detection (ITD) poses a persistent and high-impact challenge in cybersecurity due to the subtle, long-term, and context-dependent nature of malicious insider behaviors. Traditional models often struggle to capture semantic intent and complex behavior dynamics, while existing LLM-based solutions face limitations in prompt adaptability and modality coverage. To bridge this gap, we propose DMFI, a dual-modality framework that integrates semantic inference with behavior-aware fine-tu…

@arXiv_statML_bot@mastoxiv.page
2025-10-10 09:22:29

Stick-Breaking Mixture Normalizing Flows with Component-Wise Tail Adaptation for Variational Inference
Seungsu Han, Juyoung Hwang, Won Chang
https://arxiv.org/abs/2510.07965 htt…

Stick-Breaking Mixture Normalizing Flows with Component-Wise Tail Adaptation for Variational Inference
Normalizing flows with a Gaussian base provide a computationally efficient way to approximate posterior distributions in Bayesian inference, but they often struggle to capture complex posteriors with multimodality and heavy tails. We propose a stick-breaking mixture base with component-wise tail adaptation (StiCTAF) for posterior approximation. The method first learns a flexible mixture base to mitigate the mode-seeking bias of reverse KL divergence through a weighted average of component-wise …

@arXiv_mathST_bot@mastoxiv.page
2025-08-11 09:11:19

Consistency of variational inference for Besov priors in non-linear inverse problems
Shaokang Zu, Junxiong Jia, Zhiguo Wang
https://arxiv.org/abs/2508.06179 https://

Consistency of variational inference for Besov priors in non-linear inverse problems
This study investigates the variational posterior convergence rates of inverse problems for partial differential equations (PDEs) with parameters in Besov spaces $B_{pp}^α$ ($p \geq 1$) which are modeled naturally in a Bayesian manner using Besov priors constructed via random wavelet expansions with $p$-exponentially distributed coefficients. Departing from exact Bayesian inference, variational inference transforms the inference problem into an optimization problem by introducing variational s…

@arXiv_csLG_bot@mastoxiv.page
2025-10-10 11:05:29

Mix- and MoE-DPO: A Variational Inference Approach to Direct Preference Optimization
Jason Bohne, Pawel Polak, David Rosenberg, Brian Bloniarz, Gary Kazantsev
https://arxiv.org/abs/2510.08256

Mix- and MoE-DPO: A Variational Inference Approach to Direct Preference Optimization
Direct Preference Optimization (DPO) has recently emerged as a simple and effective alternative to reinforcement learning from human feedback (RLHF) for aligning large language models (LLMs) with user preferences. However, existing DPO formulations rely on a single monolithic model, which limits their expressivity in multi-task settings and their adaptability to heterogeneous or diverse preference distributions. In this work, we propose Mix- and MoE-DPO, a framework that extends DPO with both s…

@arXiv_astrophGA_bot@mastoxiv.page
2025-09-10 08:36:51

LIMFAST. IV. Learning High-Redshift Galaxy Formation from Multiline Intensity Mapping with Implicit Likelihood Inference
Guochao Sun, Tri Nguyen, Claude-Andr\'e Faucher-Gigu\`ere, Adam Lidz, Tjitske Starkenburg, Bryan R. Scott, Tzu-Ching Chang, Steven R. Furlanetto
https://arxiv.org/abs/2509.07060

LIMFAST. IV. Learning High-Redshift Galaxy Formation from Multiline Intensity Mapping with Implicit Likelihood Inference
By opening up new avenues to statistically constrain astrophysics and cosmology with large-scale structure observations, the line intensity mapping (LIM) technique calls for novel tools for efficient forward modeling and inference. Implicit likelihood inference (ILI) from semi-numerical simulations provides a powerful setup for investigating a large model parameter space in a data-driven manner, therefore gaining significant recent attention. Using simulations of high-redshift 158$μ$m [CII] an…

@arXiv_csAI_bot@mastoxiv.page
2025-07-11 07:33:01

State-Inference-Based Prompting for Natural Language Trading with Game NPCs
Minkyung Kim, Junsik Kim, Hwidong Bae, Woongcheol Yang, Sangdon Park, Sohee Bae
https://arxiv.org/abs/2507.07203

State-Inference-Based Prompting for Natural Language Trading with Game NPCs
Large Language Models enable dynamic game interactions but struggle with rule-governed trading systems. Current implementations suffer from rule violations, such as item hallucinations and calculation errors, that erode player trust. Here, State-Inference-Based Prompting (SIBP) enables reliable trading through autonomous dialogue state inference and context-specific rule adherence. The approach decomposes trading into six states within a unified prompt framework, implementing context-aware item…

@arXiv_astrophHE_bot@mastoxiv.page
2025-07-11 10:00:01

A Bayesian Framework for UHECR Source Association and Parameter Inference
Keito Watanabe, Anatoli Fedynitch, Francesca Capel, Hiroyuki Sagawa
https://arxiv.org/abs/2507.07856

A Bayesian Framework for UHECR Source Association and Parameter Inference
The identification of potential sources of ultra-high-energy cosmic rays (UHECRs) remains challenging due to magnetic deflections and propagation losses, which are particularly strong for nuclei. In previous iterations of this work, we proposed an approach for UHECR astronomy based on Bayesian inference through explicit modelling of propagation and magnetic deflection effects. The event-by-event mass information is expected to provide tighter constraints on these parameters and to help identify…

@arXiv_csRO_bot@mastoxiv.page
2025-08-11 09:26:09

ReNiL: Relative Neural Inertial Locator with Any-Scale Bayesian Inference
Kaixuan Wu (School of Computer Science, Wuhan University, Wuhan, China, School of Cyber Science and Engineering, Wuhan University, Wuhan, China), Yuanzhuo Xu (School of Computer Science, Wuhan University, Wuhan, China), Zejun Zhang (University of Southern California, Los Angeles, United States), Weiping Zhu (School of Computer Science, Wuhan University, Wuhan, China), Steve Drew (Department of Electrical and Soft…

ReNiL: Relative Neural Inertial Locator with Any-Scale Bayesian Inference
Pedestrian inertial localization is key for mobile and IoT services because it provides infrastructure-free positioning. Yet most learning-based methods depend on fixed sliding-window integration, struggle to adapt to diverse motion scales and cadences, and yield inconsistent uncertainty, limiting real-world use. We present ReNiL, a Bayesian deep-learning framework for accurate, efficient, and uncertainty-aware pedestrian localization. ReNiL introduces Inertial Positioning Demand Points (IPDPs)…

@arXiv_csLO_bot@mastoxiv.page
2025-10-10 08:12:48

Dynamic Automated Deduction by Contradiction Separation: The Standard Extension Algorithm
Yang Xu, Xingxing He, Shuwei Chen, Jun Liu, Xiaomei Zhong
https://arxiv.org/abs/2510.08468

Dynamic Automated Deduction by Contradiction Separation: The Standard Extension Algorithm
Automated deduction seeks to enable machines to reason with mathematical precision and logical completeness. Classical resolution-based systems, such as Prover9, E, and Vampire, rely on binary inference, which inherently limits multi-clause synergy during proof search. The Contradiction Separation Extension (CSE) framework, introduced by Xu et al. (2018), overcame this theoretical limitation by extending deduction beyond binary inference. However, the original work did not specify how contradic…

@arXiv_csCV_bot@mastoxiv.page
2025-09-10 10:43:41

Active Membership Inference Test (aMINT): Enhancing Model Auditability with Multi-Task Learning
Daniel DeAlcala, Aythami Morales, Julian Fierrez, Gonzalo Mancera, Ruben Tolosana, Javier Ortega-Garcia
https://arxiv.org/abs/2509.07879

Active Membership Inference Test (aMINT): Enhancing Model Auditability with Multi-Task Learning
Active Membership Inference Test (aMINT) is a method designed to detect whether given data were used during the training of machine learning models. In Active MINT, we propose a novel multitask learning process that involves training simultaneously two models: the original or Audited Model, and a secondary model, referred to as the MINT Model, responsible for identifying the data used for training the Audited Model. This novel multi-task learning approach has been designed to incorporate the au…

@arXiv_csIT_bot@mastoxiv.page
2025-10-10 07:51:59

Near-optimal Rank Adaptive Inference of High Dimensional Matrices
Fr\'ed\'eric Zheng, Yassir Jedra, Alexandre Proutiere
https://arxiv.org/abs/2510.08117 https://

Near-optimal Rank Adaptive Inference of High Dimensional Matrices
We address the problem of estimating a high-dimensional matrix from linear measurements, with a focus on designing optimal rank-adaptive algorithms. These algorithms infer the matrix by estimating its singular values and the corresponding singular vectors up to an effective rank, adaptively determined based on the data. We establish instance-specific lower bounds for the sample complexity of such algorithms, uncovering fundamental trade-offs in selecting the effective rank: balancing the precis…

@arXiv_csDC_bot@mastoxiv.page
2025-07-11 09:21:41

KIS-S: A GPU-Aware Kubernetes Inference Simulator with RL-Based Auto-Scaling
Guilin Zhang, Wulan Guo, Ziqi Tan, Qiang Guan, Hailong Jiang
https://arxiv.org/abs/2507.07932

KIS-S: A GPU-Aware Kubernetes Inference Simulator with RL-Based Auto-Scaling
Autoscaling GPU inference workloads in Kubernetes remains challenging due to the reactive and threshold-based nature of default mechanisms such as the Horizontal Pod Autoscaler (HPA), which struggle under dynamic and bursty traffic patterns and lack integration with GPU-level metrics. We present KIS-S, a unified framework that combines KISim, a GPU-aware Kubernetes Inference Simulator, with KIScaler, a Proximal Policy Optimization (PPO)-based autoscaler. KIScaler learns latency-aware and resour…

@arXiv_csSE_bot@mastoxiv.page
2025-09-11 09:07:33

Handling Open-Vocabulary Constructs in Formalizing Specifications: Retrieval-Augmented Parsing with Expert Knowledge
Mohammad Saqib Hasan, Sayontan Ghosh, Dhruv Verma, Geoff Kuenning, Erez Zadok, Scott A. Smolka, Niranjan Balasubramanian
https://arxiv.org/abs/2509.08808

Handling Open-Vocabulary Constructs in Formalizing Specifications: Retrieval-Augmented Parsing with Expert Knowledge
We study the problem of Open-Vocabulary Constructs(OVCs) -- ones not known beforehand -- in the context of converting natural language (NL) specifications into formal languages (e.g., temporal logic or code). Models fare poorly on OVCs due to a lack of necessary knowledge a priori. In such situations, a domain expert can provide correct constructs at inference time based on their preferences or domain knowledge. Our goal is to effectively reuse this inference-time, expert-provided knowledge for…

@arXiv_csCC_bot@mastoxiv.page
2025-07-11 07:32:41

Nonogram: Complexity of Inference and Phase Transition Behavior
Aaron Foote, Danny Krizanc
https://arxiv.org/abs/2507.07283 https://a…

Nonogram: Complexity of Inference and Phase Transition Behavior
Nonogram is a popular combinatorial puzzle (similar in nature to Sudoku or Minesweeper) in which a puzzle solver must determine if there exists a setting of the puzzle parameters that satisfy a given set of constraints. It has long been known that the problem of deciding if a solution exists is a computationally difficult problem. Despite this fact, humans still seem to enjoy playing it. This work aims to reconcile these seemingly contradictory facts by (1) analyzing the complexity of the infer…

@arXiv_statME_bot@mastoxiv.page
2025-08-11 08:22:10

Identifiability and Inference for Generalized Latent Factor Models
Chengyu Cui, Gongjun Xu
https://arxiv.org/abs/2508.05866 https://arxiv.org/pdf/2508.0586…

Identifiability and Inference for Generalized Latent Factor Models
Generalized latent factor analysis not only provides a useful latent embedding approach in statistics and machine learning, but also serves as a widely used tool across various scientific fields, such as psychometrics, econometrics, and social sciences. Ensuring the identifiability of latent factors and the loading matrix is essential for the model's estimability and interpretability, and various identifiability conditions have been employed by practitioners. However, fundamental statistical in…

@arXiv_csDB_bot@mastoxiv.page
2025-09-10 07:36:11

JOINT: Join Optimization and Inference via Network Traversal
Szu-Yun Ko, Ethan Chen, Bo-Cian Chang, Alan Shu-Luen Chang
https://arxiv.org/abs/2509.07230 https://

JOINT: Join Optimization and Inference via Network Traversal
Traditional relational databases require users to manually specify join keys and assume exact matches between column names and values. In practice, this limits joinability across fragmented or inconsistently named tables. We propose a fuzzy join framework that automatically identifies joinable column pairs and traverses indirect (multi-hop) join paths across multiple databases. Our method combines column name similarity with row-level fuzzy value overlap, computes edge weights using negative lo…

@arXiv_mathOC_bot@mastoxiv.page
2025-09-10 09:26:01

Differential Dynamic Programming for the Optimal Control Problem with an Ellipsoidal Target Set and Its Statistical Inference
Sungjun Eom, Gyunghoon Park
https://arxiv.org/abs/2509.07546

Differential Dynamic Programming for the Optimal Control Problem with an Ellipsoidal Target Set and Its Statistical Inference
This work addresses an extended class of optimal control problems where a target for a system state has the form of an ellipsoid rather than a fixed, single point. As a computationally affordable method for resolving the extended problem, we present a revised version of the differential dynamic programming (DDP), termed the differential dynamic programming with ellipsoidal target set (ETS-DDP). To this end, the problem with an ellipsoidal target set is reformulated into an equivalent form with …

@arXiv_csSD_bot@mastoxiv.page
2025-10-10 08:47:29

Attribution-by-design: Ensuring Inference-Time Provenance in Generative Music Systems
Fabio Morreale, Wiebke Hutiri, Joan Serr\`a, Alice Xiang, Yuki Mitsufuji
https://arxiv.org/abs/2510.08062

Attribution-by-design: Ensuring Inference-Time Provenance in Generative Music Systems
The rise of AI-generated music is diluting royalty pools and revealing structural flaws in existing remuneration frameworks, challenging the well-established artist compensation systems in the music industry. Existing compensation solutions, such as piecemeal licensing agreements, lack scalability and technical rigour, while current data attribution mechanisms provide only uncertain estimates and are rarely implemented in practice. This paper introduces a framework for a generative music infras…

@arXiv_csGT_bot@mastoxiv.page
2025-09-10 07:40:51

Inference of Intrinsic Rewards and Fairness in Multi-Agent Systems
Victor Villin, Christos Dimitrakakis
https://arxiv.org/abs/2509.07650 https://arxiv.org/…

Inference of Intrinsic Rewards and Fairness in Multi-Agent Systems
From altruism to antagonism, fairness plays a central role in social interactions. But can we truly understand how fair someone is, especially without explicit knowledge of their preferences? We cast this challenge as a multi-agent inverse reinforcement learning problem, explicitly structuring rewards to reflect how agents value the welfare of others. We introduce novel Bayesian strategies, reasoning about the optimality of demonstrations and characterisation of equilibria in general-sum Markov…

@arXiv_hepph_bot@mastoxiv.page
2025-10-10 09:24:39

Simulation-based inference for neutrino interaction model parameter tuning
Karla Tame-Narvaez, Aleksandra \'Ciprijanovi\'c, Steven Gardiner, Giuseppe Cerati
https://arxiv.org/abs/2510.07454

Simulation-based inference for neutrino interaction model parameter tuning
High-energy physics experiments studying neutrinos rely heavily on simulations of their interactions with atomic nuclei. Limitations in the theoretical understanding of these interactions typically necessitate ad hoc tuning of simulation model parameters to data. Traditional tuning methods for neutrino experiments have largely relied on simple algorithms for numerical optimization. While adequate for the modest goals of initial efforts, the complexity of future neutrino tuning campaigns is expe…

@arXiv_qfinRM_bot@mastoxiv.page
2025-09-11 08:14:03

Chaotic Bayesian Inference: Strange Attractors as Risk Models for Black Swan Events
Crystal Rust
https://arxiv.org/abs/2509.08183 https://arxiv.org/pdf/250…

Chaotic Bayesian Inference: Strange Attractors as Risk Models for Black Swan Events
We introduce a new risk modeling framework where chaotic attractors shape the geometry of Bayesian inference. By combining heavy-tailed priors with Lorenz and Rossler dynamics, the models naturally generate volatility clustering, fat tails, and extreme events. We compare two complementary approaches: Model A, which emphasizes geometric stability, and Model B, which highlights rare bursts using Fibonacci diagnostics. Together, they provide a dual perspective for systemic risk analysis, linking B…

@simon_brooke@mastodon.scot
2025-07-11 11:04:31

"[Chain of reasoning] reports are untrustworthy on principle: they are plausible explanations for plausible responses, and since the inferences involved are more complex, they burn more compute and carbon per query as well as introducing more mistakes"
This is a particularly offensive point about #LLMs: we actually do have a class of systems, inference engines, which do reason and can…

Artificial intelligence is the opposite of education
Or: what if there is no middle ground?

@arXiv_csCR_bot@mastoxiv.page
2025-10-10 08:34:39

Comparison of Fully Homomorphic Encryption and Garbled Circuit Techniques in Privacy-Preserving Machine Learning Inference
Kalyan Cheerla (University of North Texas), Lotfi Ben Othmane (University of North Texas), Kirill Morozov (University of North Texas)
https://arxiv.org/abs/2510.07457

Comparison of Fully Homomorphic Encryption and Garbled Circuit Techniques in Privacy-Preserving Machine Learning Inference
Machine Learning (ML) is making its way into fields such as healthcare, finance, and Natural Language Processing (NLP), and concerns over data privacy and model confidentiality continue to grow. Privacy-preserving Machine Learning (PPML) addresses this challenge by enabling inference on private data without revealing sensitive inputs or proprietary models. Leveraging Secure Computation techniques from Cryptography, two widely studied approaches in this domain are Fully Homomorphic Encryption (F…

@arXiv_astrophCO_bot@mastoxiv.page
2025-07-11 09:09:11

Fisher Score Matching for Simulation-Based Forecasting and Inference
Ce Sui, Shivam Pandey, Benjamin D. Wandelt
https://arxiv.org/abs/2507.07833 https://…

Fisher Score Matching for Simulation-Based Forecasting and Inference
We propose a method for estimating the Fisher score--the gradient of the log-likelihood with respect to model parameters--using score matching. By introducing a latent parameter model, we show that the Fisher score can be learned by training a neural network to predict latent scores via a mean squared error loss. We validate our approach on a toy linear Gaussian model and a cosmological example using a differentiable simulator. In both cases, the learned scores closely match ground truth for pl…

@arXiv_econEM_bot@mastoxiv.page
2025-09-11 09:00:03

Posterior inference of attitude-behaviour relationships using latent class choice models
Akshay Vij, Stephane Hess
https://arxiv.org/abs/2509.08373 https://

Posterior inference of attitude-behaviour relationships using latent class choice models
The link between attitudes and behaviour has been a key topic in choice modelling for two decades, with the widespread application of ever more complex hybrid choice models. This paper proposes a flexible and transparent alternative framework for empirically examining the relationship between attitudes and behaviours using latent class choice models (LCCMs). Rather than embedding attitudinal constructs within the structural model, as in hybrid choice frameworks, we recover class-specific attitu…

@arXiv_eessAS_bot@mastoxiv.page
2025-08-11 07:48:09

NanoCodec: Towards High-Quality Ultra Fast Speech LLM Inference
Edresson Casanova, Paarth Neekhara, Ryan Langman, Shehzeen Hussain, Subhankar Ghosh, Xuesong Yang, Ante Juki\'c, Jason Li, Boris Ginsburg
https://arxiv.org/abs/2508.05835

NanoCodec: Towards High-Quality Ultra Fast Speech LLM Inference
Large Language Models (LLMs) have significantly advanced audio processing by leveraging audio codecs to discretize audio into tokens, enabling the application of language modeling techniques to speech data. However, existing audio codecs often operate at high frame rates, leading to slow training and inference, particularly for autoregressive models. To address this, there is growing interest in low frame-rate audio codecs, which reduce the number of autoregressive steps required to generate on…

@seeingwithsound@mas.to
2025-10-09 20:21:08

Space impacts temporal processing via a visual-dependent spatially organized neural architecture https://doi.org/10.1523/jneurosci.1444-24.2025 "spatial features affected the temporal processing of sighted but not blind people, regardless of age."

Space impacts temporal processing via a visual-dependent spatially organized neural architecture
Establishing the temporal relationship between stimuli challenges the brain, requiring some tolerance for asynchronies to form coherent representations. Based on the theory of implicit causal inference, we hypothesized temporal processing of events is influenced by spatial features as stimuli coming from the same spatial location are most likely to derive from a common source and, consequently, implicitly merged in time. As visual experience guides the formation of neural sensory maps, we expec…

@arXiv_csMA_bot@mastoxiv.page
2025-09-10 09:12:11

Towards Generalized Routing: Model and Agent Orchestration for Adaptive and Efficient Inference
Xiyu Guo, Shan Wang, Chunfang Ji, Xuefeng Zhao, Wenhao Xi, Yaoyao Liu, Qinglan Li, Chao Deng, Junlan Feng
https://arxiv.org/abs/2509.07571

Towards Generalized Routing: Model and Agent Orchestration for Adaptive and Efficient Inference
The rapid advancement of large language models (LLMs) and domain-specific AI agents has greatly expanded the ecosystem of AI-powered services. User queries, however, are highly diverse and often span multiple domains and task types, resulting in a complex and heterogeneous landscape. This diversity presents a fundamental routing challenge: how to accurately direct each query to an appropriate execution unit while optimizing both performance and efficiency. To address this, we propose MoMA (Mixt…

@arXiv_csLG_bot@mastoxiv.page
2025-09-10 10:39:51

MoE-Compression: How the Compression Error of Experts Affects the Inference Accuracy of MoE Model?
Songkai Ma, Zhaorui Zhang, Sheng Di, Benben Liu, Xiaodong Yu, Xiaoyi Lu, Dan Wang
https://arxiv.org/abs/2509.07727

MoE-Compression: How the Compression Error of Experts Affects the Inference Accuracy of MoE Model?
With the widespread application of Mixture of Experts (MoE) reasoning models in the field of LLM learning, efficiently serving MoE models under limited GPU memory constraints has emerged as a significant challenge. Offloading the non-activated experts to main memory has been identified as an efficient approach to address such a problem, while it brings the challenges of transferring the expert between the GPU memory and main memory. We need to explore an efficient approach to compress the exper…

@arXiv_csAI_bot@mastoxiv.page
2025-08-11 07:32:49

A Framework for Inherently Safer AGI through Language-Mediated Active Inference
Bo Wen
https://arxiv.org/abs/2508.05766 https://arxiv.org/pdf/2508.05766

A Framework for Inherently Safer AGI through Language-Mediated Active Inference
This paper proposes a novel framework for developing safe Artificial General Intelligence (AGI) by combining Active Inference principles with Large Language Models (LLMs). We argue that traditional approaches to AI safety, focused on post-hoc interpretability and reward engineering, have fundamental limitations. We present an architecture where safety guarantees are integrated into the system's core design through transparent belief representations and hierarchical value alignment. Our framewor…

@arXiv_physicschemph_bot@mastoxiv.page
2025-07-11 09:16:21

Physics-Informed Gaussian Process Inference of Liquid Structure from Scattering Data
Harry W. Sullivan, Brennon L. Shanks, Matej Cervenka, Michael P. Hoepfner
https://arxiv.org/abs/2507.07948

Physics-Informed Gaussian Process Inference of Liquid Structure from Scattering Data
We present a nonparametric Bayesian framework to infer radial distribution functions from experimental scattering measurements with uncertainty quantification using non-stationary Gaussian processes. The Gaussian process prior mean and kernel functions are designed to mitigate well-known numerical challenges with the Fourier transform, including discrete measurement binning and detector windowing, while encoding fundamental yet minimal physical knowledge of liquid structure. We demonstrate unce…

@arXiv_csDC_bot@mastoxiv.page
2025-08-11 07:37:49

EC2MoE: Adaptive End-Cloud Pipeline Collaboration Enabling Scalable Mixture-of-Experts Inference
Zheming Yang, Yunqing Hu, Sheng Sun, Wen Ji
https://arxiv.org/abs/2508.06024 htt…

EC2MoE: Adaptive End-Cloud Pipeline Collaboration Enabling Scalable Mixture-of-Experts Inference
The Mixture-of-Experts (MoE) paradigm has emerged as a promising solution to scale up model capacity while maintaining inference efficiency. However, deploying MoE models across heterogeneous end-cloud environments poses new challenges in expert scheduling, communication overhead, and resource heterogeneity. In this paper, we propose EC2MoE, an adaptive framework for scalable MoE inference via end-cloud pipeline collaboration. First, we design a hardware-aware lightweight group gate network tha…

@arXiv_quantph_bot@mastoxiv.page
2025-10-09 10:48:21

Accelerating Inference for Multilayer Neural Networks with Quantum Computers
Arthur G. Rattew, Po-Wei Huang, Naixu Guo, Lirand\"e Pira, Patrick Rebentrost
https://arxiv.org/abs/2510.07195

Accelerating Inference for Multilayer Neural Networks with Quantum Computers
Fault-tolerant Quantum Processing Units (QPUs) promise to deliver exponential speed-ups in select computational tasks, yet their integration into modern deep learning pipelines remains unclear. In this work, we take a step towards bridging this gap by presenting the first fully-coherent quantum implementation of a multilayer neural network with non-linear activation functions. Our constructions mirror widely used deep learning architectures based on ResNet, and consist of residual blocks with m…

@arXiv_astrophHE_bot@mastoxiv.page
2025-09-10 09:00:31

When (not) to trust Monte Carlo approximations for hierarchical Bayesian inference
Jack Heinzel, Salvatore Vitale
https://arxiv.org/abs/2509.07221 https://…

When (not) to trust Monte Carlo approximations for hierarchical Bayesian inference
The coming years of gravitational wave astrophysics promises thousands of new detections, which can unlock fundamental scientific insights if the information in each observation can be properly synthesized into a coherent picture. State-of-the-art approaches often accomplish this with hierarchical Bayesian inference. However, this typically relies on Monte Carlo approximations that are already very expensive in current data, and may become prohibitively so in the future. In this paper we show h…

@arXiv_eessSP_bot@mastoxiv.page
2025-09-08 08:12:00

Communication-Efficient Collaborative LLM Inference via Distributed Speculative Decoding
Ce Zheng, Tingting Yang
https://arxiv.org/abs/2509.04576 https://a…

Communication-Efficient Collaborative LLM Inference via Distributed Speculative Decoding
Speculative decoding is an emerging technique that accelerates large language model (LLM) inference by allowing a smaller draft model to predict multiple tokens in advance, which are then verified or corrected by a larger target model. In AI-native radio access networks (AI-RAN), this paradigm is well-suited for collaborative inference between resource-constrained end devices and more capable edge servers or base stations (BSs). However, existing distributed speculative decoding requires transm…

@arXiv_csAR_bot@mastoxiv.page
2025-09-09 07:31:41

High Utilization Energy-Aware Real-Time Inference Deep Convolutional Neural Network Accelerator
Kuan-Ting Lin, Ching-Te Chiu, Jheng-Yi Chang, Shi-Zong Huang, Yu-Ting Li
https://arxiv.org/abs/2509.05688

High Utilization Energy-Aware Real-Time Inference Deep Convolutional Neural Network Accelerator
Deep convolution Neural Network (DCNN) has been widely used in computer vision tasks. However, for edge devices even inference has too large computational complexity and data access amount. The inference latency of state-of-the-art models are impractical for real-world applications. In this paper, we propose a high utilization energy-aware real-time inference deep convolutional neural network accelerator, which improves the performance of the current accelerators. First, we use the 1x1 size con…

@arXiv_csRO_bot@mastoxiv.page
2025-09-11 08:50:03

SVN-ICP: Uncertainty Estimation of ICP-based LiDAR Odometry using Stein Variational Newton
Shiping Ma, Haoming Zhang, Marc Toussaint
https://arxiv.org/abs/2509.08069 https://

SVN-ICP: Uncertainty Estimation of ICP-based LiDAR Odometry using Stein Variational Newton
This letter introduces SVN-ICP, a novel Iterative Closest Point (ICP) algorithm with uncertainty estimation that leverages Stein Variational Newton (SVN) on manifold. Designed specifically for fusing LiDAR odometry in multisensor systems, the proposed method ensures accurate pose estimation and consistent noise parameter inference, even in LiDAR-degraded environments. By approximating the posterior distribution using particles within the Stein Variational Inference framework, SVN-ICP eliminates…

@arXiv_csDC_bot@mastoxiv.page
2025-09-10 08:34:31

DuoServe-MoE: Dual-Phase Expert Prefetch and Cache Scheduling for Efficient MoE LLM Inference
Yuning Zhang, Grant Pinkert, Nan Yang, Yanli Li, Dong Yuan
https://arxiv.org/abs/2509.07379

DuoServe-MoE: Dual-Phase Expert Prefetch and Cache Scheduling for Efficient MoE LLM Inference
Large Language Models (LLMs) have demonstrated impressive performance across a wide range of deep learning tasks. Mixture of Experts (MoE) further enhances their capabilities by increasing model width through sparsely activated expert branches, which keeps inference computation efficient. However, the large number of expert weights introduces significant GPU memory pressure, especially in resource-constrained environments such as single-GPU servers. More importantly, MoE inference consists of t…

@arXiv_csLO_bot@mastoxiv.page
2025-09-09 08:27:02

Compositional Inductive Invariant Inference via Assume-Guarantee Reasoning
Ian Dardik, Eunsuk Kang
https://arxiv.org/abs/2509.06250 https://arxiv.org/pdf/2…

Compositional Inductive Invariant Inference via Assume-Guarantee Reasoning
A common technique for verifying the safety of complex systems is the inductive invariant method. Inductive invariants are inductive formulas that overapproximate the reachable states of a system and imply a desired safety property. However, inductive invariants are notoriously complex, which makes inductive invariant inference a challenging problem. In this work, we observe that inductive invariant formulas are complex primarily because they must be closed over the transition relation of an en…

@arXiv_csLG_bot@mastoxiv.page
2025-10-09 10:50:21

A Multi-Agent Framework for Stateful Inference-Time Search
Arshika Lalan, Rajat Ghosh, Aditya Kolsur, Debojyoti Dutta
https://arxiv.org/abs/2510.07147 https://

A Multi-Agent Framework for Stateful Inference-Time Search
Recent work explores agentic inference-time techniques to perform structured, multi-step reasoning. However, stateless inference often struggles on multi-step tasks due to the absence of persistent state. Moreover, task-specific fine-tuning or instruction-tuning often achieve surface-level code generation but remain brittle on tasks requiring deeper reasoning and long-horizon dependencies. To address these limitations, we propose stateful multi-agent evolutionary search, a training-free framewo…

@arXiv_statME_bot@mastoxiv.page
2025-07-11 09:48:51

Late Fusion Multi-task Learning for Semiparametric Inference with Nuisance Parameters
Sohom Bhattacharya, Yongzhuo Chen, Muxuan Liang
https://arxiv.org/abs/2507.07941

Late Fusion Multi-task Learning for Semiparametric Inference with Nuisance Parameters
In the age of large and heterogeneous datasets, the integration of information from diverse sources is essential to improve parameter estimation. Multi-task learning offers a powerful approach by enabling simultaneous learning across related tasks. In this work, we introduce a late fusion framework for multi-task learning with semiparametric models that involve infinite-dimensional nuisance parameters, focusing on applications such as heterogeneous treatment effect estimation across multiple da…

@arXiv_statML_bot@mastoxiv.page
2025-09-09 09:06:22

MOSAIC: Minimax-Optimal Sparsity-Adaptive Inference for Change Points in Dynamic Networks
Yingying Fan, Jingyuan Liu, Jinchi Lv, Ao Sun
https://arxiv.org/abs/2509.06303 https://…

MOSAIC: Minimax-Optimal Sparsity-Adaptive Inference for Change Points in Dynamic Networks
We propose a new inference framework, named MOSAIC, for change-point detection in dynamic networks with the simultaneous low-rank and sparse-change structure. We establish the minimax rate of detection boundary, which relies on the sparsity of changes. We then develop an eigen-decomposition-based test with screened signals that approaches the minimax rate in theory, with only a minor logarithmic loss. For practical implementation of MOSAIC, we adjust the theoretical test by a novel residual-bas…

@arXiv_mathNA_bot@mastoxiv.page
2025-10-10 09:28:09

Likelihood-informed Model Reduction for Bayesian Inference of Static Structural Loads
Jakob Scheffels, Elizabeth Qian, Iason Papaioannou, Elisabeth Ullmann
https://arxiv.org/abs/2510.07950

Likelihood-informed Model Reduction for Bayesian Inference of Static Structural Loads
Bayesian inverse problems use data to update a prior probability distribution on uncertain parameter values to a posterior distribution. Such problems arise in many structural engineering applications, but computational solution of Bayesian inverse problems is often expensive because standard solution approaches require many evaluations of the forward model mapping the parameter value to predicted observations. In many settings, this forward model is expensive because it requires the solution o…

@arXiv_astrophGA_bot@mastoxiv.page
2025-08-11 08:40:49

Data-driven dust inference at mid-to-high Galactic latitudes using probabilistic machine learning
Matthew O'Callaghan, Kaisey S. Mandel, Gerry Gilmore
https://arxiv.org/abs/2508.05781

Data-driven dust inference at mid-to-high Galactic latitudes using probabilistic machine learning
We present a method for accurately and precisely inferring photometric dust extinction towards stars at mid-to-high Galactic latitudes using probabilistic machine learning to model the colour-magnitude distribution of zero-extinction stars in these regions. Photometric dust maps rely on a robust method for inferring stellar reddening. At high Galactic latitudes, where extinction is low, such inferences are particularly susceptible to contamination from modelling errors and prior assumptions, po…

@arXiv_csCV_bot@mastoxiv.page
2025-09-11 07:58:33

Video Parallel Scaling: Aggregating Diverse Frame Subsets for VideoLLMs
Hyungjin Chung, Hyelin Nam, Jiyeon Kim, Hyojun Go, Byeongjun Park, Junho Kim, Joonseok Lee, Seongsu Ha, Byung-Hoon Kim
https://arxiv.org/abs/2509.08016

Video Parallel Scaling: Aggregating Diverse Frame Subsets for VideoLLMs
Video Large Language Models (VideoLLMs) face a critical bottleneck: increasing the number of input frames to capture fine-grained temporal detail leads to prohibitive computational costs and performance degradation from long context lengths. We introduce Video Parallel Scaling (VPS), an inference-time method that expands a model's perceptual bandwidth without increasing its context window. VPS operates by running multiple parallel inference streams, each processing a unique, disjoint subset of …

@arXiv_csCR_bot@mastoxiv.page
2025-09-09 12:06:22

Imitative Membership Inference Attack
Yuntao Du, Yuetian Chen, Hanshen Xiao, Bruno Ribeiro, Ninghui Li
https://arxiv.org/abs/2509.06796 https://arxiv.org/p…

Imitative Membership Inference Attack
A Membership Inference Attack (MIA) assesses how much a target machine learning model reveals about its training data by determining whether specific query instances were part of the training set. State-of-the-art MIAs rely on training hundreds of shadow models that are independent of the target model, leading to significant computational overhead. In this paper, we introduce Imitative Membership Inference Attack (IMIA), which employs a novel imitative training technique to strategically constr…

@arXiv_mathST_bot@mastoxiv.page
2025-09-10 09:52:01

Bayesian inference with Besov-Laplace priors for spatially inhomogeneous binary classification surfaces
Matteo Giordano
https://arxiv.org/abs/2509.07439 https://

Bayesian inference with Besov-Laplace priors for spatially inhomogeneous binary classification surfaces
In this article, we study the binary classification problem with supervised data, in the case where the covariate-to-probability-of-success map is possibly spatially inhomogeneous. We devise nonparametric Bayesian procedures with Besov-Laplace priors, which are prior distributions on function spaces routinely used in imaging and inverse problems in view of their useful edge-preserving and sparsity-promoting properties. Building on a recent line of work in the literature, we investigate the theo…

@arXiv_astrophCO_bot@mastoxiv.page
2025-09-11 07:57:52

Taking the Weight Off: Mitigating Parameter Bias from Catastrophic Outliers in 3$\times$2pt Analysis
Carolyn McDonald Mill, C. Danielle Leonard, Markus Michael Rau, Cora Uhlemann, Shahab Joudaki
https://arxiv.org/abs/2509.08052

Taking the Weight Off: Mitigating Parameter Bias from Catastrophic Outliers in 3$\times$2pt Analysis
Stage IV cosmological surveys will map the universe with unprecedented precision, reducing statistical uncertainties to levels where unmodelled systematics can significantly bias inference. In particular, photometric redshift (photo-z) errors and intrinsic alignments (IA) must be robustly accounted for to ensure accurate inference of cosmological parameters. The increasing depth of Stage IV surveys exacerbates these challenges by producing low signal-to-noise galaxy populations prone to inaccur…

@arXiv_csLG_bot@mastoxiv.page
2025-10-10 11:06:29

Dynamic Features Adaptation in Networking: Toward Flexible training and Explainable inference
Yannis Belkhiter, Seshu Tirupathi, Giulio Zizzo, Merim Dzaferagic, John D. Kelleher
https://arxiv.org/abs/2510.08303

Dynamic Features Adaptation in Networking: Toward Flexible training and Explainable inference
As AI becomes a native component of 6G network control, AI models must adapt to continuously changing conditions, including the introduction of new features and measurements driven by multi-vendor deployments, hardware upgrades, and evolving service requirements. To address this growing need for flexible learning in non-stationary environments, this vision paper highlights Adaptive Random Forests (ARFs) as a reliable solution for dynamic feature adaptation in communication network scenarios. We…

@arXiv_csAI_bot@mastoxiv.page
2025-09-11 07:41:02

Automatic Failure Attribution and Critical Step Prediction Method for Multi-Agent Systems Based on Causal Inference
Guoqing Ma, Jia Zhu, Hanghui Guo, Weijie Shi, Jiawei Shen, Jingjiang Liu, Yidan Liang
https://arxiv.org/abs/2509.08682

Automatic Failure Attribution and Critical Step Prediction Method for Multi-Agent Systems Based on Causal Inference
Multi-agent systems (MAS) are critical for automating complex tasks, yet their practical deployment is severely hampered by the challenge of failure attribution. Current diagnostic tools, which rely on statistical correlations, are fundamentally inadequate; on challenging benchmarks like Who\&When, state-of-the-art methods achieve less than 15\% accuracy in locating the root-cause step of a failure. To address this critical gap, we introduce the first failure attribution framework for MAS groun…

@arXiv_hepph_bot@mastoxiv.page
2025-09-09 08:11:02

Unbinning global LHC analyses
Henning Bahl, Tilman Plehn, Nikita Schmal
https://arxiv.org/abs/2509.05409 https://arxiv.org/pdf/2509.05409

Unbinning global LHC analyses
Simulation-based inference has been shown to outperform traditional, histogram-based inference in numerous phenomenological and experimental studies. So far, these analyses have focused on individual high-profile processes. We study the combination of four different di-boson processes in terms of the Standard Model Effective Field Theory. Our results demonstrate how simulation-based inference also wins over traditional methods for global LHC analyses.

@arXiv_csCL_bot@mastoxiv.page
2025-07-11 10:06:41

Why is Your Language Model a Poor Implicit Reward Model?
Noam Razin, Yong Lin, Jiarui Yao, Sanjeev Arora
https://arxiv.org/abs/2507.07981 https://

Why is Your Language Model a Poor Implicit Reward Model?
Reward models are key to language model post-training and inference pipelines. Conveniently, recent work showed that every language model defines an implicit reward model (IM-RM), without requiring any architectural changes. However, such IM-RMs tend to generalize worse, especially out-of-distribution, compared to explicit reward models (EX-RMs) that apply a dedicated linear head over the hidden representations of a language model. The existence of a generalization gap is puzzling, as EX-RMs an…

@arXiv_csAR_bot@mastoxiv.page
2025-09-11 08:37:53

BitROM: Weight Reload-Free CiROM Architecture Towards Billion-Parameter 1.58-bit LLM Inference
Wenlun Zhang, Xinyu Li, Shimpei Ando, Kentaro Yoshioka
https://arxiv.org/abs/2509.08542

BitROM: Weight Reload-Free CiROM Architecture Towards Billion-Parameter 1.58-bit LLM Inference
Compute-in-Read-Only-Memory (CiROM) accelerators offer outstanding energy efficiency for CNNs by eliminating runtime weight updates. However, their scalability to Large Language Models (LLMs) is fundamentally constrained by their vast parameter sizes. Notably, LLaMA-7B - the smallest model in LLaMA series - demands more than 1,000 cm2 of silicon area even in advanced CMOS nodes. This paper presents BitROM, the first CiROM-based accelerator that overcomes this limitation through co-design with B…

@Techmeme@techhub.social
2025-09-05 13:40:43

Baseten, which helps companies launch open-source or custom AI models, raised a $150M Series D led by Bond at a $2.15B valuation, up from $825M in February (Allie Garfinkle/Fortune)
https://fortune.com/2025/09/05/exclusive-b…

Exclusive: Baseten, AI inference unicorn, raises $150 million at $2.15 billion valuation
Baseten has raised a $150 million Series D, led by Bond, Fortune has exclusively learned.

@arXiv_statME_bot@mastoxiv.page
2025-09-11 08:45:33

Doubly robust average treatment effect estimation for survival data
Byeonghee Lee, Joonsung Kang
https://arxiv.org/abs/2509.08788 https://arxiv.org/pdf/250…

Doubly robust average treatment effect estimation for survival data
Considering censored outcomes in survival analysis can lead to quite complex results in the model setting of causal inference. Causal inference has attracted a lot of attention over the past few years, but little research has been done on survival analysis. Even for the only research conducted, the machine learning method was considered assuming a large sample, which is not suitable in that the actual data are high dimensional low sample size (HDLSS) method. Therefore, penalty is considered for…

@arXiv_statML_bot@mastoxiv.page
2025-09-09 08:54:51

Fisher Random Walk: Automatic Debiasing Contextual Preference Inference for Large Language Model Evaluation
Yichi Zhang, Alexander Belloni, Ethan X. Fang, Junwei Lu, Xiaoan Xu
https://arxiv.org/abs/2509.05852

Fisher Random Walk: Automatic Debiasing Contextual Preference Inference for Large Language Model Evaluation
Motivated by the need for rigorous and scalable evaluation of large language models, we study contextual preference inference for pairwise comparison functionals of context-dependent preference score functions across domains. Focusing on the contextual Bradley-Terry-Luce model, we develop a semiparametric efficient estimator that automates the debiased estimation through aggregating weighted residual balancing terms across the comparison graph. We show that the efficiency is achieved when the w…

@arXiv_mathST_bot@mastoxiv.page
2025-09-09 10:31:32

Statistical Inference for Misspecified Contextual Bandits
Yongyi Guo, Ziping Xu
https://arxiv.org/abs/2509.06287 https://arxiv.org/pdf/2509.06287

Statistical Inference for Misspecified Contextual Bandits
Contextual bandit algorithms have transformed modern experimentation by enabling real-time adaptation for personalized treatment and efficient use of data. Yet these advantages create challenges for statistical inference due to adaptivity. A fundamental property that supports valid inference is policy convergence, meaning that action-selection probabilities converge in probability given the context. Convergence ensures replicability of adaptive experiments and stability of online algorithms. In…

@arXiv_csCR_bot@mastoxiv.page
2025-09-09 11:48:22

DCMI: A Differential Calibration Membership Inference Attack Against Retrieval-Augmented Generation
Xinyu Gao, Xiangtao Meng, Yingkai Dong, Zheng Li, Shanqing Guo
https://arxiv.org/abs/2509.06026

DCMI: A Differential Calibration Membership Inference Attack Against Retrieval-Augmented Generation
While Retrieval-Augmented Generation (RAG) effectively reduces hallucinations by integrating external knowledge bases, it introduces vulnerabilities to membership inference attacks (MIAs), particularly in systems handling sensitive data. Existing MIAs targeting RAG's external databases often rely on model responses but ignore the interference of non-member-retrieved documents on RAG outputs, limiting their effectiveness. To address this, we propose DCMI, a differential calibration MIA that miti…

@arXiv_statME_bot@mastoxiv.page
2025-09-10 09:49:11

Monitoring Adverse Events Through Bayesian Nonparametric Clustering Across Studies
Shijie Yuan, Kevin Roberts, Noirrit Kiran Chandra, Yuan Ji, Peter M\"uller
https://arxiv.org/abs/2509.07267

Monitoring Adverse Events Through Bayesian Nonparametric Clustering Across Studies
We introduce a Bayesian nonparametric inference approach for aggregate adverse event (AE) monitoring across studies. The proposed model seamlessly integrates external data from historical trials to define a relevant background rate and accommodates varying levels of covariate granularity (ranging from patient-level details to study-level aggregated summary data). Inference is based on a covariate-dependent product partition model (PPMx). A central element of the model is the ability to group ex…

@arXiv_astrophCO_bot@mastoxiv.page
2025-09-09 09:47:22

Unlocking 21cm Cosmology with SBI: A Beginner friendly NRE for Inference of Astrophysical Parameters
Bisweswar Sen, Abhirup Datta
https://arxiv.org/abs/2509.06834 https://

Unlocking 21cm Cosmology with SBI: A Beginner friendly NRE for Inference of Astrophysical Parameters
The 21-cm line of neutral hydrogen is a promising probe of the early Universe, yet extracting astrophysical parameters from its power spectrum remains a major challenge. We present a beginner-friendly PyTorch pipeline for Marginal Neural Ratio Estimation (MNRE), a Simulation-Based Inference (SBI) method that bypasses explicit likelihoods. Using 21cmFAST simulations, we show that MNRE can recover key astrophysical parameters such as the ionizing efficiency $ζ$ and X-ray luminosity $L_X$ directl…

@arXiv_csCL_bot@mastoxiv.page
2025-08-11 09:56:09

InfoCausalQA:Can Models Perform Non-explicit Causal Reasoning Based on Infographic?
Keummin Ka, Junhyeong Park, Jahyun Jeon, Youngjae Yu
https://arxiv.org/abs/2508.06220 https:/…

InfoCausalQA:Can Models Perform Non-explicit Causal Reasoning Based on Infographic?
Recent advances in Vision-Language Models (VLMs) have demonstrated impressive capabilities in perception and reasoning. However, the ability to perform causal inference -- a core aspect of human cognition -- remains underexplored, particularly in multimodal settings. In this study, we introduce InfoCausalQA, a novel benchmark designed to evaluate causal reasoning grounded in infographics that combine structured visual data with textual context. The benchmark comprises two tasks: Task 1 focuses …

@arXiv_csRO_bot@mastoxiv.page
2025-10-07 11:45:42

HyperVLA: Efficient Inference in Vision-Language-Action Models via Hypernetworks
Zheng Xiong, Kang Li, Zilin Wang, Matthew Jackson, Jakob Foerster, Shimon Whiteson
https://arxiv.org/abs/2510.04898

HyperVLA: Efficient Inference in Vision-Language-Action Models via Hypernetworks
Built upon language and vision foundation models with strong generalization ability and trained on large-scale robotic data, Vision-Language-Action (VLA) models have recently emerged as a promising approach to learning generalist robotic policies. However, a key drawback of existing VLAs is their extremely high inference costs. In this paper, we propose HyperVLA to address this problem. Unlike existing monolithic VLAs that activate the whole model during both training and inference, HyperVLA us…

@arXiv_csCV_bot@mastoxiv.page
2025-09-09 12:30:42

Intraoperative 2D/3D Registration via Spherical Similarity Learning and Inference-Time Differentiable Levenberg-Marquardt Optimization
Minheng Chen, Youyong Kong
https://arxiv.org/abs/2509.06890

Intraoperative 2D/3D Registration via Spherical Similarity Learning and Inference-Time Differentiable Levenberg-Marquardt Optimization
Intraoperative 2D/3D registration aligns preoperative 3D volumes with real-time 2D radiographs, enabling accurate localization of instruments and implants. A recent fully differentiable similarity learning framework approximates geodesic distances on SE(3), expanding the capture range of registration and mitigating the effects of substantial disturbances, but existing Euclidean approximations distort manifold structure and slow convergence. To address these limitations, we explore similarity le…

@arXiv_csCR_bot@mastoxiv.page
2025-10-08 09:39:59

Membership Inference Attacks on Tokenizers of Large Language Models
Meng Tong, Yuntao Du, Kejiang Chen, Weiming Zhang, Ninghui Li
https://arxiv.org/abs/2510.05699 https://

Membership Inference Attacks on Tokenizers of Large Language Models
Membership inference attacks (MIAs) are widely used to assess the privacy risks associated with machine learning models. However, when these attacks are applied to pre-trained large language models (LLMs), they encounter significant challenges, including mislabeled samples, distribution shifts, and discrepancies in model size between experimental and real-world settings. To address these limitations, we introduce tokenizers as a new attack vector for membership inference. Specifically, a tokeni…

@arXiv_csCL_bot@mastoxiv.page
2025-10-10 10:57:09

Memory Retrieval and Consolidation in Large Language Models through Function Tokens
Shaohua Zhang, Yuan Lin, Hang Li
https://arxiv.org/abs/2510.08203 https://

Memory Retrieval and Consolidation in Large Language Models through Function Tokens
The remarkable success of large language models (LLMs) stems from their ability to consolidate vast amounts of knowledge into the memory during pre-training and to retrieve it from the memory during inference, enabling advanced capabilities such as knowledge memorization, instruction-following and reasoning. However, the mechanisms of memory retrieval and consolidation in LLMs remain poorly understood. In this paper, we propose the function token hypothesis to explain the workings of LLMs: Duri…

@arXiv_statME_bot@mastoxiv.page
2025-09-09 10:21:32

Bayesian Inference for Confounding Variables and Limited Information
Ellis Scharfenaker, Duncan K. Foley
https://arxiv.org/abs/2509.05520 https://arxiv.org…

Bayesian Inference for Confounding Variables and Limited Information
A central challenge in statistical inference is the presence of confounding variables that may distort observed associations between treatment and outcome. Conventional "causal" methods, grounded in assumptions such as ignorability, exclude the possibility of unobserved confounders, leading to posterior inferences that overstate certainty. We develop a Bayesian framework that relaxes these assumptions by introducing entropy-favoring priors over hypothesis spaces that explicitly allow for latent…

@arXiv_statML_bot@mastoxiv.page
2025-07-11 08:40:11

Hess-MC2: Sequential Monte Carlo Squared using Hessian Information and Second Order Proposals
Joshua Murphy, Conor Rosato, Andrew Millard, Lee Devlin, Paul Horridge, Simon Maskell
https://arxiv.org/abs/2507.07461

Hess-MC2: Sequential Monte Carlo Squared using Hessian Information and Second Order Proposals
When performing Bayesian inference using Sequential Monte Carlo (SMC) methods, two considerations arise: the accuracy of the posterior approximation and computational efficiency. To address computational demands, Sequential Monte Carlo Squared (SMC$^2$) is well-suited for high-performance computing (HPC) environments. The design of the proposal distribution within SMC$^2$ can improve accuracy and exploration of the posterior as poor proposals may lead to high variance in importance weights and …

@arXiv_csLG_bot@mastoxiv.page
2025-10-08 10:49:49

Empirical Comparison of Membership Inference Attacks in Deep Transfer Learning
Yuxuan Bai, Gauri Pradhan, Marlon Tobaben, Antti Honkela
https://arxiv.org/abs/2510.05753 https://…

Empirical Comparison of Membership Inference Attacks in Deep Transfer Learning
With the emergence of powerful large-scale foundation models, the training paradigm is increasingly shifting from from-scratch training to transfer learning. This enables high utility training with small, domain-specific datasets typical in sensitive applications.Membership inference attacks (MIAs) provide an empirical estimate of the privacy leakage by machine learning models. Yet, prior assessments of MIAs against models fine-tuned with transfer learning rely on a small subset of possible att…

@arXiv_csAI_bot@mastoxiv.page
2025-10-07 12:16:52

Staircase Streaming for Low-Latency Multi-Agent Inference
Junlin Wang (Zach), Jue Wang (Zach), Zhen (Zach), Xu, Ben Athiwaratkun, Bhuwan Dhingra, Ce Zhang, James Zou
https://arxiv.org/abs/2510.05059 …

Staircase Streaming for Low-Latency Multi-Agent Inference
Recent advances in large language models (LLMs) opened up new directions for leveraging the collective expertise of multiple LLMs. These methods, such as Mixture-of-Agents, typically employ additional inference steps to generate intermediate outputs, which are then used to produce the final response. While multi-agent inference can enhance response quality, it can significantly increase the time to first token (TTFT), posing a challenge for latency-sensitive applications and hurting user experi…

@arXiv_csCV_bot@mastoxiv.page
2025-07-11 10:18:51

TinierHAR: Towards Ultra-Lightweight Deep Learning Models for Efficient Human Activity Recognition on Edge Devices
Sizhen Bian, Mengxi Liu, Vitor Fortes Rey, Daniel Geissler, Paul Lukowicz
https://arxiv.org/abs/2507.07949

TinierHAR: Towards Ultra-Lightweight Deep Learning Models for Efficient Human Activity Recognition on Edge Devices
Human Activity Recognition (HAR) on resource-constrained wearable devices demands inference models that harmonize accuracy with computational efficiency. This paper introduces TinierHAR, an ultra-lightweight deep learning architecture that synergizes residual depthwise separable convolutions, gated recurrent units (GRUs), and temporal aggregation to achieve SOTA efficiency without compromising performance. Evaluated across 14 public HAR datasets, TinierHAR reduces Parameters by 2.7x (vs. TinyHA…

@arXiv_csCL_bot@mastoxiv.page
2025-10-10 10:52:19

DACIP-RC: Domain Adaptive Continual Instruction Pre-Training via Reading Comprehension on Business Conversations
Elena Khasanova, Harsh Saini, Md Tahmid Rahman Laskar, Xue-Yong Fu, Cheng Chen, Shashi Bhushan TN
https://arxiv.org/abs/2510.08152

DACIP-RC: Domain Adaptive Continual Instruction Pre-Training via Reading Comprehension on Business Conversations
The rapid advancements in Large Language Models (LLMs) have enabled their adoption in real-world industrial scenarios for various natural language processing tasks. However, the high inference cost of large-scale LLMs makes their deployment impractical, necessitating the use of smaller models. Despite their efficiency, smaller LLMs lack robust zero-shot instruction-following capabilities across diverse domains, limiting their adaptability to dynamic user requirements. Traditional fine-tuning ap…

@arXiv_csLG_bot@mastoxiv.page
2025-10-08 10:43:39

(Token-Level) \textbf{InfoRMIA}: Stronger Membership Inference and Memorization Assessment for LLMs
Jiashu Tao, Reza Shokri
https://arxiv.org/abs/2510.05582 https://

(Token-Level) \textbf{InfoRMIA}: Stronger Membership Inference and Memorization Assessment for LLMs
Machine learning models are known to leak sensitive information, as they inevitably memorize (parts of) their training data. More alarmingly, large language models (LLMs) are now trained on nearly all available data, which amplifies the magnitude of information leakage and raises serious privacy risks. Hence, it is more crucial than ever to quantify privacy risk before the release of LLMs. The standard method to quantify privacy is via membership inference attacks, where the state-of-the-art ap…

@arXiv_csCV_bot@mastoxiv.page
2025-10-10 11:22:49

ReSplat: Learning Recurrent Gaussian Splats
Haofei Xu, Daniel Barath, Andreas Geiger, Marc Pollefeys
https://arxiv.org/abs/2510.08575 https://arxiv.org/pdf…

ReSplat: Learning Recurrent Gaussian Splats
While feed-forward Gaussian splatting models provide computational efficiency and effectively handle sparse input settings, their performance is fundamentally limited by the reliance on a single forward pass during inference. We propose ReSplat, a feed-forward recurrent Gaussian splatting model that iteratively refines 3D Gaussians without explicitly computing gradients. Our key insight is that the Gaussian splatting rendering error serves as a rich feedback signal, guiding the recurrent networ…

@arXiv_csAI_bot@mastoxiv.page
2025-09-10 10:01:11

Unleashing the True Potential of LLMs: A Feedback-Triggered Self-Correction with Long-Term Multipath Decoding
Jipeng Li, Zeyu Gao, Yubin Qi, Hande Dong, Weijian Chen, Qiang Lin
https://arxiv.org/abs/2509.07676

Unleashing the True Potential of LLMs: A Feedback-Triggered Self-Correction with Long-Term Multipath Decoding
Large Language Models (LLMs) have achieved remarkable performance across diverse tasks, yet their susceptibility to generating incorrect content during inference remains a critical unsolved challenge. While self-correction methods offer potential solutions, their effectiveness is hindered by two inherent limitations: (1) the absence of reliable guidance signals for error localization, and (2) the restricted reasoning depth imposed by conventional next-token decoding paradigms. To address these …

@arXiv_csCL_bot@mastoxiv.page
2025-09-09 12:08:12

COMPACT: Common-token Optimized Model Pruning Across Channels and Tokens
Eugene Kwek, Wenpeng Yin
https://arxiv.org/abs/2509.06836 https://arxiv.org/pdf/25…

COMPACT: Common-token Optimized Model Pruning Across Channels and Tokens
Making LLMs more efficient in memory, latency, and serving cost is crucial for edge deployment, interactive applications, and sustainable inference at scale. Pruning is a key technique toward this goal. However, prior pruning methods are limited: width pruning often breaks the standard transformer layout or requires custom inference code, while depth pruning removes entire layers and can cause abrupt accuracy drops. In this work, we propose COMPACT, which jointly (i) prunes rare vocabulary to s…

@arXiv_csLG_bot@mastoxiv.page
2025-10-06 10:27:19

Best-of-Majority: Minimax-Optimal Strategy for Pass@$k$ Inference Scaling
Qiwei Di, Kaixuan Ji, Xuheng Li, Heyang Zhao, Quanquan Gu
https://arxiv.org/abs/2510.03199 https://

Best-of-Majority: Minimax-Optimal Strategy for Pass@$k$ Inference Scaling
LLM inference often generates a batch of candidates for a prompt and selects one via strategies like majority voting or Best-of- N (BoN). For difficult tasks, this single-shot selection often underperforms. Consequently, evaluations commonly report Pass@$k$: the agent may submit up to $k$ responses, and only the best of them is used when computing regret. Motivated by this, we study inference scaling in the more general Pass@$k$ inference setting, and prove that neither majority voting nor BoN …

@arXiv_csCR_bot@mastoxiv.page
2025-10-10 08:26:39

PATCH: Mitigating PII Leakage in Language Models with Privacy-Aware Targeted Circuit PatcHing
Anthony Hughes, Vasisht Duddu, N. Asokan, Nikolaos Aletras, Ning Ma
https://arxiv.org/abs/2510.07452

PATCH: Mitigating PII Leakage in Language Models with Privacy-Aware Targeted Circuit PatcHing
Language models (LMs) may memorize personally identifiable information (PII) from training data, enabling adversaries to extract it during inference. Existing defense mechanisms such as differential privacy (DP) reduce this leakage, but incur large drops in utility. Based on a comprehensive study using circuit discovery to identify the computational circuits responsible PII leakage in LMs, we hypothesize that specific PII leakage circuits in LMs should be responsible for this behavior. Therefor…

@arXiv_csAI_bot@mastoxiv.page
2025-10-10 10:22:09

PEAR: Phase Entropy Aware Reward for Efficient Reasoning
Chen Huang, Wei Lu, Wenxuan Zhang
https://arxiv.org/abs/2510.08026 https://arxiv.org/pdf/2510.0802…

PEAR: Phase Entropy Aware Reward for Efficient Reasoning
Large Reasoning Models (LRMs) have achieved impressive performance on complex reasoning tasks by generating detailed chain-of-thought (CoT) explanations. However, these responses are often excessively long, containing redundant reasoning steps that inflate inference cost and reduce usability. Controlling the length of generated reasoning without sacrificing accuracy remains an open challenge. Through a systematic empirical analysis, we reveal a consistent positive correlation between model entr…

@arXiv_csCV_bot@mastoxiv.page
2025-10-10 11:21:29

ResAD: Normalized Residual Trajectory Modeling for End-to-End Autonomous Driving
Zhiyu Zheng, Shaoyu Chen, Haoran Yin, Xinbang Zhang, Jialv Zou, Xinggang Wang, Qian Zhang, Lefei Zhang
https://arxiv.org/abs/2510.08562

ResAD: Normalized Residual Trajectory Modeling for End-to-End Autonomous Driving
End-to-end autonomous driving (E2EAD) systems, which learn to predict future trajectories directly from sensor data, are fundamentally challenged by the inherent spatio-temporal imbalance of trajectory data. This imbalance creates a significant optimization burden, causing models to learn spurious correlations instead of causal inference, while also prioritizing uncertain, distant predictions, thereby compromising immediate safety. To address these issues, we propose ResAD, a novel Normalized R…

@arXiv_csCL_bot@mastoxiv.page
2025-09-10 10:01:01

Talking with Oompa Loompas: A novel framework for evaluating linguistic acquisition of LLM agents
Sankalp Tattwadarshi Swain, Anshika Krishnatray, Dhruv Kumar, Jagat Sesh Challa
https://arxiv.org/abs/2509.07389

Talking with Oompa Loompas: A novel framework for evaluating linguistic acquisition of LLM agents
Existing evaluation studies on linguistic competence of large language models (LLM agents) have focused primarily on vocabulary learning, morphological rule induction, syntactic generalization, pragmatic inference, and cross-linguistic transfer. However, none assess whether LLM agents can acquire a language through pattern recognition and interactive feedback, a central feature of human language acquisition. We propose a novel experimental framework in which an LLM agent is evaluated on its abi…

@arXiv_csLG_bot@mastoxiv.page
2025-10-10 11:02:39

Enhancing Reasoning for Diffusion LLMs via Distribution Matching Policy Optimization
Yuchen Zhu, Wei Guo, Jaemoo Choi, Petr Molodyk, Bo Yuan, Molei Tao, Yongxin Chen
https://arxiv.org/abs/2510.08233

Enhancing Reasoning for Diffusion LLMs via Distribution Matching Policy Optimization
Diffusion large language models (dLLMs) are promising alternatives to autoregressive large language models (AR-LLMs), as they potentially allow higher inference throughput. Reinforcement learning (RL) is a crucial component for dLLMs to achieve comparable performance with AR-LLMs on important tasks, such as reasoning. However, RL algorithms that are well-suited for dLLMs' unique characteristics have yet to be developed. This paper proposes Distribution Matching Policy Optimization (DMPO), a pri…

@arXiv_csCV_bot@mastoxiv.page
2025-10-10 11:19:29

ARTDECO: Towards Efficient and High-Fidelity On-the-Fly 3D Reconstruction with Structured Scene Representation
Guanghao Li, Kerui Ren, Linning Xu, Zhewen Zheng, Changjian Jiang, Xin Gao, Bo Dai, Jian Pu, Mulin Yu, Jiangmiao Pang
https://arxiv.org/abs/2510.08551

ARTDECO: Towards Efficient and High-Fidelity On-the-Fly 3D Reconstruction with Structured Scene Representation
On-the-fly 3D reconstruction from monocular image sequences is a long-standing challenge in computer vision, critical for applications such as real-to-sim, AR/VR, and robotics. Existing methods face a major tradeoff: per-scene optimization yields high fidelity but is computationally expensive, whereas feed-forward foundation models enable real-time inference but struggle with accuracy and robustness. In this work, we propose ARTDECO, a unified framework that combines the efficiency of feed-forw…

@arXiv_csCL_bot@mastoxiv.page
2025-10-09 10:24:11

Adaptive LLM-Symbolic Reasoning via Dynamic Logical Solver Composition
Lei Xu, Pierre Beckmann, Marco Valentino, Andr\'e Freitas
https://arxiv.org/abs/2510.06774 https://

Adaptive LLM-Symbolic Reasoning via Dynamic Logical Solver Composition
Neuro-symbolic NLP methods aim to leverage the complementary strengths of large language models and formal logical solvers. However, current approaches are mostly static in nature, i.e., the integration of a target solver is predetermined at design time, hindering the ability to employ diverse formal inference strategies. To address this, we introduce an adaptive, multi-paradigm, neuro-symbolic inference framework that: (1) automatically identifies formal reasoning strategies from problems expr…

@arXiv_csAI_bot@mastoxiv.page
2025-07-11 09:16:01

Position: We Need An Algorithmic Understanding of Generative AI
Oliver Eberle, Thomas McGee, Hamza Giaffar, Taylor Webb, Ida Momennejad
https://arxiv.org/abs/2507.07544

Position: We Need An Algorithmic Understanding of Generative AI
What algorithms do LLMs actually learn and use to solve problems? Studies addressing this question are sparse, as research priorities are focused on improving performance through scale, leaving a theoretical and empirical gap in understanding emergent algorithms. This position paper proposes AlgEval: a framework for systematic research into the algorithms that LLMs learn and use. AlgEval aims to uncover algorithmic primitives, reflected in latent representations, attention, and inference-time c…

@arXiv_csLG_bot@mastoxiv.page
2025-09-10 10:26:01

Learning Generalized Hamiltonian Dynamics with Stability from Noisy Trajectory Data
Luke McLennan, Yi Wang, Ryan Farell, Minh Nguyen, Chandrajit Bajaj
https://arxiv.org/abs/2509.07280

Learning Generalized Hamiltonian Dynamics with Stability from Noisy Trajectory Data
We introduce a robust framework for learning various generalized Hamiltonian dynamics from noisy, sparse phase-space data and in an unsupervised manner based on variational Bayesian inference. Although conservative, dissipative, and port-Hamiltonian systems might share the same initial total energy of a closed system, it is challenging for a single Hamiltonian network model to capture the distinctive and varying motion dynamics and physics of a phase space, from sampled observational phase spac…

@arXiv_csCL_bot@mastoxiv.page
2025-08-11 09:55:49

EICAP: Deep Dive in Assessment and Enhancement of Large Language Models in Emotional Intelligence through Multi-Turn Conversations
Nizi Nazar, Ehsaneddin Asgari
https://arxiv.org/abs/2508.06196

EICAP: Deep Dive in Assessment and Enhancement of Large Language Models in Emotional Intelligence through Multi-Turn Conversations
Emotional Intelligence (EI) is a critical yet underexplored dimension in the development of human-aligned LLMs. To address this gap, we introduce a unified, psychologically grounded four-layer taxonomy of EI tailored for large language models (LLMs), encompassing emotional tracking, cause inference, appraisal, and emotionally appropriate response generation. Building on this framework, we present EICAP-Bench, a novel MCQ style multi-turn benchmark designed to evaluate EI capabilities in open-so…

@arXiv_csLG_bot@mastoxiv.page
2025-10-10 11:05:59

Counterfactual Identifiability via Dynamic Optimal Transport
Fabio De Sousa Ribeiro, Ainkaran Santhirasekaram, Ben Glocker
https://arxiv.org/abs/2510.08294 https://

Counterfactual Identifiability via Dynamic Optimal Transport
We address the open question of counterfactual identification for high-dimensional multivariate outcomes from observational data. Pearl (2000) argues that counterfactuals must be identifiable (i.e., recoverable from the observed data distribution) to justify causal claims. A recent line of work on counterfactual inference shows promising results but lacks identification, undermining the causal validity of its estimates. To address this, we establish a foundation for multivariate counterfactual …

@arXiv_csCL_bot@mastoxiv.page
2025-08-11 09:55:59

Classification is a RAG problem: A case study on hate speech detection
Richard Willats, Josh Pennington, Aravind Mohan, Bertie Vidgen
https://arxiv.org/abs/2508.06204 https://…

Classification is a RAG problem: A case study on hate speech detection
Robust content moderation requires classification systems that can quickly adapt to evolving policies without costly retraining. We present classification using Retrieval-Augmented Generation (RAG), which shifts traditional classification tasks from determining the correct category in accordance with pre-trained parameters to evaluating content in relation to contextual knowledge retrieved at inference. In hate speech detection, this transforms the task from "is this hate speech?" to "does this…

@arXiv_csLG_bot@mastoxiv.page
2025-09-11 10:14:13

Sharing is Caring: Efficient LM Post-Training with Collective RL Experience Sharing
Jeffrey Amico, Gabriel Passamani Andrade, John Donaghy, Ben Fielding, Tristin Forbus, Harry Grieve, Semih Kara, Jari Kolehmainen, Yihua Lou, Christopher Nies, Edward Phillip Flores Nu\~no, Diogo Ortega, Shikhar Rastogi, Austin Virts, Matthew J. Wright
https://

Sharing is Caring: Efficient LM Post-Training with Collective RL Experience Sharing
Post-training language models (LMs) with reinforcement learning (RL) can enhance their complex reasoning capabilities without supervised fine-tuning, as demonstrated by DeepSeek-R1-Zero. However, effectively utilizing RL for LMs requires significant parallelization to scale-up inference, which introduces non-trivial technical challenges (e.g. latency, memory, and reliability) alongside ever-growing financial costs. We present Swarm sAmpling Policy Optimization (SAPO), a fully decentralized and …

@arXiv_csCL_bot@mastoxiv.page
2025-09-10 10:17:01

HALT-RAG: A Task-Adaptable Framework for Hallucination Detection with Calibrated NLI Ensembles and Abstention
Saumya Goswami, Siddharth Kurra
https://arxiv.org/abs/2509.07475 ht…

HALT-RAG: A Task-Adaptable Framework for Hallucination Detection with Calibrated NLI Ensembles and Abstention
Detecting content that contradicts or is unsupported by a given source text is a critical challenge for the safe deployment of generative language models. We introduce HALT-RAG, a post-hoc verification system designed to identify hallucinations in the outputs of Retrieval-Augmented Generation (RAG) pipelines. Our flexible and task-adaptable framework uses a universal feature set derived from an ensemble of two frozen, off-the-shelf Natural Language Inference (NLI) models and lightweight lexical…

@arXiv_csLG_bot@mastoxiv.page
2025-09-10 10:33:31

EMORF-II: Adaptive EM-based Outlier-Robust Filtering with Correlated Measurement Noise
Arslan Majal, Aamir Hussain Chughtai, Muhammad Tahir
https://arxiv.org/abs/2509.07415 http…

EMORF-II: Adaptive EM-based Outlier-Robust Filtering with Correlated Measurement Noise
We present a learning-based outlier-robust filter for a general setup where the measurement noise can be correlated. Since it is an enhanced version of EM-based outlier robust filter (EMORF), we call it as EMORF-II. As it is equipped with an additional powerful feature to learn the outlier characteristics during inference along with outlier-detection, EMORF-II has improved outlier-mitigation capability. Numerical experiments confirm performance gains as compared to the state-of-the-art methods …

@arXiv_csLG_bot@mastoxiv.page
2025-09-10 10:23:31

IP-Basis PINNs: Efficient Multi-Query Inverse Parameter Estimation
Shalev Manor, Mohammad Kohandel
https://arxiv.org/abs/2509.07245 https://arxiv.org/pdf/2…

IP-Basis PINNs: Efficient Multi-Query Inverse Parameter Estimation
Solving inverse problems with Physics-Informed Neural Networks (PINNs) is computationally expensive for multi-query scenarios, as each new set of observed data requires a new, expensive training procedure. We present Inverse-Parameter Basis PINNs (IP-Basis PINNs), a meta-learning framework that extends the foundational work of Desai et al. (2022) to enable rapid and efficient inference for inverse problems. Our method employs an offline-online decomposition: a deep network is first trained offl…

@arXiv_csLG_bot@mastoxiv.page
2025-07-11 10:23:51

Skip a Layer or Loop it? Test-Time Depth Adaptation of Pretrained LLMs
Ziyue Li, Yang Li, Tianyi Zhou
https://arxiv.org/abs/2507.07996 https://arxiv.org/pdf/2507.07996 https://arxiv.org/html/2507.07996
arXiv:2507.07996v1 Announce Type: new
Abstract: Can a pretrained neural network adapt its architecture to different inputs without any finetuning? Do we need all layers for simple tasks, and are they adequate for challenging tasks? We found that the layers of a pretrained large language model (LLM) can be manipulated as separate modules to build a better and even shallower model customized for each test sample. In particular, each layer from the pretrained model can be skipped/pruned or repeated multiple times as recurrent neural networks (RNN), and stacked with others in arbitrary orders, yielding a chain-of-layers (CoLa) per sample. This compositional space greatly expands the scope of existing works on looped/recurrent pretrained modules, layer pruning, or early-exit networks. We develop a Monte Carlo Tree Search (MCTS) protocol to explore and identify the optimal CoLa for each sample from math and commonsense reasoning benchmarks. Compared to a static model of a fixed depth, CoLa allows shortcut paths (fast thinking), recurrence of the same layer(s) (slow thinking), and combining both, offering more flexible, dynamic architectures for different inputs. We conduct an extensive analysis of the MCTS-optimized CoLa, which leads to two key findings: (1) For >75% of samples with correct predictions by the original LLM, we can find shorter CoLa, suggesting a large space for improving inference efficiency; (2) For >60% of samples with originally incorrect predictions, we can identify CoLa achieving correct predictions, suggesting a large space of performance enhancement. Our results highlight the shortcomings of using a fixed architecture of pre-trained LLMs for inference on different samples and pave the way to unlock the generalization power of test-time depth adaptation.
toXiv_bot_toot

Tootfinder

Opt-in global Mastodon full text search. Join the index!