Tootfinder

@arXiv_csAI_bot@mastoxiv.page
2025-08-13 07:32:22

LLM-BI: Towards Fully Automated Bayesian Inference with Large Language Models
Yongchao Huang
https://arxiv.org/abs/2508.08300 https://arxiv.org/pdf/2508.08…

LLM-BI: Towards Fully Automated Bayesian Inference with Large Language Models
A significant barrier to the widespread adoption of Bayesian inference is the specification of prior distributions and likelihoods, which often requires specialized statistical expertise. This paper investigates the feasibility of using a Large Language Model (LLM) to automate this process. We introduce LLM-BI (Large Language Model-driven Bayesian Inference), a conceptual pipeline for automating Bayesian workflows. As a proof-of-concept, we present two experiments focused on Bayesian linear reg…

@arXiv_csCR_bot@mastoxiv.page
2025-09-12 08:40:49

Towards Confidential and Efficient LLM Inference with Dual Privacy Protection
Honglan Yu, Yibin Wang, Feifei Dai, Dong Liu, Haihui Fan, Xiaoyan Gu
https://arxiv.org/abs/2509.09091

Towards Confidential and Efficient LLM Inference with Dual Privacy Protection
CPU-based trusted execution environments (TEEs) and differential privacy (DP) have gained wide applications for private inference. Due to high inference latency in TEEs, researchers use partition-based approaches that offload linear model components to GPUs. However, dense nonlinear layers of large language models (LLMs) result in significant communication overhead between TEEs and GPUs. DP-based approaches apply random noise to protect data privacy, but this compromises LLM performance and sem…

@arXiv_csRO_bot@mastoxiv.page
2025-10-13 10:14:30

Zero-shot Structure Learning and Planning for Autonomous Robot Navigation using Active Inference
Daria de tinguy, Tim Verbelen, Emilio Gamba, Bart Dhoedt
https://arxiv.org/abs/2510.09574

Zero-shot Structure Learning and Planning for Autonomous Robot Navigation using Active Inference
Autonomous navigation in unfamiliar environments requires robots to simultaneously explore, localise, and plan under uncertainty, without relying on predefined maps or extensive training. We present a biologically inspired, Active Inference-based framework, Active Inference MAPping and Planning (AIMAPP). This model unifies mapping, localisation, and decision-making within a single generative model. Inspired by hippocampal navigation, it uses topological reasoning, place-cell encoding, and episo…

@arXiv_csLG_bot@mastoxiv.page
2025-09-12 10:05:29

Fused Lasso Improves Accuracy of Co-occurrence Network Inference in Grouped Samples
Daniel Agyapong, Briana H. Beatty, Peter G. Kennedy, Toby D. Hocking
https://arxiv.org/abs/2509.09413

Fused Lasso Improves Accuracy of Co-occurrence Network Inference in Grouped Samples
Co-occurrence network inference algorithms have significantly advanced our understanding of microbiome communities. However, these algorithms typically analyze microbial associations within samples collected from a single environmental niche, often capturing only static snapshots rather than dynamic microbial processes. Previous studies have commonly grouped samples from different environmental niches together without fully considering how microbial communities adapt their associations when fac…

@arXiv_csCL_bot@mastoxiv.page
2025-08-13 10:16:32

READER: Retrieval-Assisted Drafter for Efficient LLM Inference
Maxim Divilkovskiy, Vitaly Malygin, Sergey Zlobin, Sultan Isali, Vasily Kalugin, Stanislav Ilyushin, Nuriza Aitassova, Yi Fei, Zeng Weidi
https://arxiv.org/abs/2508.09072

READER: Retrieval-Assisted Drafter for Efficient LLM Inference
Large Language Models (LLMs) generate tokens autoregressively, with each token depending on the preceding context. This sequential nature makes the inference process inherently difficult to accelerate, posing a significant challenge for efficient deployment. In recent years, various methods have been proposed to address this issue, with the most effective approaches often involving the training of additional draft models. In this paper, we introduce READER (Retrieval-Assisted Drafter for Effici…

@arXiv_statML_bot@mastoxiv.page
2025-10-13 09:03:40

Efficient Autoregressive Inference for Transformer Probabilistic Models
Conor Hassan, Nasrulloh Loka, Cen-You Li, Daolang Huang, Paul E. Chang, Yang Yang, Francesco Silvestrin, Samuel Kaski, Luigi Acerbi
https://arxiv.org/abs/2510.09477

Efficient Autoregressive Inference for Transformer Probabilistic Models
Transformer-based models for amortized probabilistic inference, such as neural processes, prior-fitted networks, and tabular foundation models, excel at single-pass marginal prediction. However, many real-world applications, from signal interpolation to multi-column tabular predictions, require coherent joint distributions that capture dependencies between predictions. While purely autoregressive architectures efficiently generate such distributions, they sacrifice the flexible set-conditioning…

@arXiv_statME_bot@mastoxiv.page
2025-10-13 08:57:40

Robust and Efficient Semiparametric Inference for the Stepped Wedge Design
Fan Xia, K. C. Gary Chan, Emily Voldal, Avi Kenny, Patrick J. Heagerty, James P. Hughes
https://arxiv.org/abs/2510.08972

Robust and Efficient Semiparametric Inference for the Stepped Wedge Design
Stepped wedge designs (SWDs) are increasingly used to evaluate longitudinal cluster-level interventions but pose substantial challenges for valid inference. Because crossover times are randomized, intervention effects are intrinsically confounded with secular time trends, while heterogeneity across clusters, complex correlation structures, baseline covariate imbalances, and small numbers of clusters further complicate inference. We propose a unified semiparametric framework for estimating possi…

@arXiv_csAR_bot@mastoxiv.page
2025-09-12 07:32:59

Combating the Memory Walls: Optimization Pathways for Long-Context Agentic LLM Inference
Haoran Wu, Can Xiao, Jiayi Nie, Xuan Guo, Binglei Lou, Jeffrey T. H. Wong, Zhiwen Mo, Cheng Zhang, Przemyslaw Forys, Wayne Luk, Hongxiang Fan, Jianyi Cheng, Timothy M. Jones, Rika Antonova, Robert Mullins, Aaron Zhao
https://arxiv.org/abs/2509.09505

Combating the Memory Walls: Optimization Pathways for Long-Context Agentic LLM Inference
LLMs now form the backbone of AI agents for a diverse array of applications, including tool use, command-line agents, and web or computer use agents. These agentic LLM inference tasks are fundamentally different from chatbot-focused inference -- they often have much larger context lengths to capture complex, prolonged inputs, such as entire webpage DOMs or complicated tool call trajectories. This, in turn, generates significant off-chip memory traffic for the underlying hardware at the inferenc…

@arXiv_csCV_bot@mastoxiv.page
2025-09-11 09:32:43

Boosted Training of Lightweight Early Exits for Optimizing CNN Image Classification Inference
Yehudit Aperstein, Alexander Apartsin
https://arxiv.org/abs/2509.08318 https://

Boosted Training of Lightweight Early Exits for Optimizing CNN Image Classification Inference
Real-time image classification on resource-constrained platforms demands inference methods that balance accuracy with strict latency and power budgets. Early-exit strategies address this need by attaching auxiliary classifiers to intermediate layers of convolutional neural networks (CNNs), allowing "easy" samples to terminate inference early. However, conventional training of early exits introduces a covariance shift: downstream branches are trained on full datasets, while at inference they pro…

@arXiv_csDC_bot@mastoxiv.page
2025-08-13 07:34:52

Profiling Concurrent Vision Inference Workloads on NVIDIA Jetson -- Extended
Abhinaba Chakraborty, Wouter Tavernier, Akis Kourtis, Mario Pickavet, Andreas Oikonomakis, Didier Colle
https://arxiv.org/abs/2508.08430

Profiling Concurrent Vision Inference Workloads on NVIDIA Jetson -- Extended
The proliferation of IoT devices and advancements in network technologies have intensified the demand for real-time data processing at the network edge. To address these demands, low-power AI accelerators, particularly GPUs, are increasingly deployed for inference tasks, enabling efficient computation while mitigating cloud-based systems' latency and bandwidth limitations. Despite their growing deployment, GPUs remain underutilised even in computationally intensive workloads. This underutilisat…

@arXiv_csPF_bot@mastoxiv.page
2025-08-13 08:06:12

Profiling Large Language Model Inference on Apple Silicon: A Quantization Perspective
Afsara Benazir, Felix Xiaozhu Lin
https://arxiv.org/abs/2508.08531 https://

Profiling Large Language Model Inference on Apple Silicon: A Quantization Perspective
A systematic understanding of Apple Silicon is lacking in the current landscape of hardware efficiency; research focus is largely centered on accelerating GPUs for large-scale training or inference on CUDA devices. This paper investigates Apple Silicon's unique memory architecture that offers a unified memory integrating CPU and GPU memory and its implications for on-device LLM inference. We decipher myths about whether Apple Silicon is efficient for on-device inference compared to competitor…

@arXiv_csDB_bot@mastoxiv.page
2025-08-13 07:39:42

FIER: Fine-Grained and Efficient KV Cache Retrieval for Long-context LLM Inference
Dongwei Wang, Zijie Liu, Song Wang, Yuxin Ren, Jianing Deng, Jingtong Hu, Tianlong Chen, Huanrui Yang
https://arxiv.org/abs/2508.08256

FIER: Fine-Grained and Efficient KV Cache Retrieval for Long-context LLM Inference
The Key-Value (KV) cache reading latency increases significantly with context lengths, hindering the efficiency of long-context LLM inference. To address this, previous works propose retaining a small fraction of KV cache based on token importance. For example, KV eviction uses static heuristics to retain tokens, while KV retrieval dynamically selects query-relevant tokens for more adaptive cache management. However, we observe that important tokens are often sparsely distributed across the lon…

@arXiv_astrophCO_bot@mastoxiv.page
2025-09-12 09:28:29

Cosmology inference with perturbative forward modeling at the field level: a comparison with joint power spectrum and bispectrum analyses
Kazuyuki Akitsu, Marko Simonovi\'c, Shi-Fan Chen, Giovanni Cabass, Matias Zaldarriaga
https://arxiv.org/abs/2509.09673

Cosmology inference with perturbative forward modeling at the field level: a comparison with joint power spectrum and bispectrum analyses
We extend field-level inference to jointly constrain the cosmological parameters $\{A,ω_{\rm cdm},H_0\}$, in both real and redshift space. Our analyses are based on mock data generated using a perturbative forward model, with noise drawn from a Gaussian distribution with a constant power spectrum. This idealized setting, where the field-level likelihood is exactly Gaussian, allows us to precisely quantify the information content in the nonlinear field on large scales. We find that field-level …

@arXiv_mathNA_bot@mastoxiv.page
2025-09-11 08:12:43

Tensor-Train Operator Inference
Engin Danis, Duc Truong, Kim {\O}. Rasmussen{\S}, Boian S. Alexandrov
https://arxiv.org/abs/2509.08071 https://arxiv.org/pd…

Tensor-Train Operator Inference
In this study, we present a tensor--train framework for nonintrusive operator inference aimed at learning discrete operators and using them to predict solutions of physical governing equations. Our framework comprises three approaches: full--order tensor--train operator inference, full--order quantized tensor--train operator inference, and reduced--order tensor--train operator inference. In each case, snapshot data is represented in tensor--train format--either through compression or cross inte…

@arXiv_mathST_bot@mastoxiv.page
2025-08-13 08:07:42

Toward Optimal Statistical Inference in Noisy Linear Quadratic Reinforcement Learning over a Finite Horizon
Bo Pan, Jianya Lu, Yafei Wang, Hao Li, Bei Jiang, Linglong Kong
https://arxiv.org/abs/2508.08436

Toward Optimal Statistical Inference in Noisy Linear Quadratic Reinforcement Learning over a Finite Horizon
Recent developments in Reinforcement learning have significantly enhanced sequential decision-making in uncertain environments. Despite their strong performance guarantees, most existing work has focused primarily on improving the operational accuracy of learned control policies and the convergence rates of learning algorithms, with comparatively little attention to uncertainty quantification and statistical inference. Yet, these aspects are essential for assessing the reliability and variabili…

@arXiv_csIT_bot@mastoxiv.page
2025-09-12 07:35:39

Improved Receiver Chain Performance via Error Location Inference
Michael Greenwood, Robert Hunter
https://arxiv.org/abs/2509.08869 https://arxiv.org/pdf/25…

Improved Receiver Chain Performance via Error Location Inference
Modern spacecraft communication systems rely on concatenated error correction schemes, typically combining convolutional and Reed-Solomon (RS) codes. This paper presents a decoder-side method that uses a machine learning model to estimate the likelihood of byte-level corruption in received data frames. These estimates are used to mark erasures prior to RS decoding, enhancing its correction capacity without requiring changes to spacecraft hardware or encoding standards. The approach enables impr…

@arXiv_csSD_bot@mastoxiv.page
2025-09-12 07:37:59

In situ estimation of the acoustic surface impedance using simulation-based inference
Jonas M. Schmid, Johannes D. Schmid, Martin Eser, Steffen Marburg
https://arxiv.org/abs/2509.08873

In situ estimation of the acoustic surface impedance using simulation-based inference
Accurate acoustic simulations of enclosed spaces require precise boundary conditions, typically expressed through surface impedances for wave-based methods. Conventional measurement techniques often rely on simplifying assumptions about the sound field and mounting conditions, limiting their validity for real-world scenarios. To overcome these limitations, this study introduces a Bayesian framework for the in situ estimation of frequency-dependent acoustic surface impedances from sparse interio…

@arXiv_csCR_bot@mastoxiv.page
2025-09-12 09:09:09

CryptGNN: Enabling Secure Inference for Graph Neural Networks
Pritam Sen, Yao Ma, Cristian Borcea
https://arxiv.org/abs/2509.09107 https://arxiv.org/pdf/25…

CryptGNN: Enabling Secure Inference for Graph Neural Networks
We present CryptGNN, a secure and effective inference solution for third-party graph neural network (GNN) models in the cloud, which are accessed by clients as ML as a service (MLaaS). The main novelty of CryptGNN is its secure message passing and feature transformation layers using distributed secure multi-party computation (SMPC) techniques. CryptGNN protects the client's input data and graph structure from the cloud provider and the third-party model owner, and it protects the model paramete…

@arXiv_csLG_bot@mastoxiv.page
2025-10-13 10:41:20

Efficient Bayesian Inference from Noisy Pairwise Comparisons
Till Aczel, Lucas Theis, Wattenhofer Roger
https://arxiv.org/abs/2510.09333 https://arxiv.org/…

Efficient Bayesian Inference from Noisy Pairwise Comparisons
Evaluating generative models is challenging because standard metrics often fail to reflect human preferences. Human evaluations are more reliable but costly and noisy, as participants vary in expertise, attention, and diligence. Pairwise comparisons improve consistency, yet aggregating them into overall quality scores requires careful modeling. Bradley-Terry-based methods update item scores from comparisons, but existing approaches either ignore rater variability or lack convergence guarantees,…

@arXiv_csAI_bot@mastoxiv.page
2025-09-12 09:05:49

Jupiter: Enhancing LLM Data Analysis Capabilities via Notebook and Inference-Time Value-Guided Search
Shuocheng Li, Yihao Liu, Silin Du, Wenxuan Zeng, Zhe Xu, Mengyu Zhou, Yeye He, Haoyu Dong, Shi Han, Dongmei Zhang
https://arxiv.org/abs/2509.09245

Jupiter: Enhancing LLM Data Analysis Capabilities via Notebook and Inference-Time Value-Guided Search
Large language models (LLMs) have shown great promise in automating data science workflows, but existing models still struggle with multi-step reasoning and tool use, which limits their effectiveness on complex data analysis tasks. To address this, we propose a scalable pipeline that extracts high-quality, tool-based data analysis tasks and their executable multi-step solutions from real-world Jupyter notebooks and associated data files. Using this pipeline, we introduce NbQA, a large-scale dat…

@arXiv_csCL_bot@mastoxiv.page
2025-10-13 10:30:20

FLRC: Fine-grained Low-Rank Compressor for Efficient LLM Inference
Yu-Chen Lu, Chong-Yan Chen, Chi-Chih Chang, Yu-Fang Hu, Kai-Chiang Wu
https://arxiv.org/abs/2510.09332 https:/…

FLRC: Fine-grained Low-Rank Compressor for Efficient LLM Inference
Although large language models (LLM) have achieved remarkable performance, their enormous parameter counts hinder deployment on resource-constrained hardware. Low-rank compression can reduce both memory usage and computational demand, but applying a uniform compression ratio across all layers often leads to significant performance degradation, and previous methods perform poorly during decoding. To address these issues, we propose the Fine-grained Low-Rank Compressor (FLRC), which efficiently d…

@arXiv_csET_bot@mastoxiv.page
2025-10-13 07:33:30

When to Reason: Semantic Router for vLLM
Chen Wang, Xunzhuo Liu, Yuhan Liu, Yue Zhu, Xiangxi Mo, Junchen Jiang, Huamin Chen
https://arxiv.org/abs/2510.08731 https://

When to Reason: Semantic Router for vLLM
Large Language Models (LLMs) demonstrate substantial accuracy gains when augmented with reasoning modes such as chain-of-thought and inference-time scaling. However, reasoning also incurs significant costs in inference latency and token usage, with environmental and financial impacts, which are unnecessary for many simple prompts. We present a semantic router that classifies queries based on their reasoning requirements and selectively applies reasoning only when beneficial. Our approach achiev…

@arXiv_csCR_bot@mastoxiv.page
2025-09-12 09:43:49

ENSI: Efficient Non-Interactive Secure Inference for Large Language Models
Zhiyu He, Maojiang Wang, Xinwen Gao, Yuchuan Luo, Lin Liu, Shaojing Fu
https://arxiv.org/abs/2509.09424

ENSI: Efficient Non-Interactive Secure Inference for Large Language Models
Secure inference enables privacy-preserving machine learning by leveraging cryptographic protocols that support computations on sensitive user data without exposing it. However, integrating cryptographic protocols with large language models (LLMs) presents significant challenges, as the inherent complexity of these protocols, together with LLMs' massive parameter scale and sophisticated architectures, severely limits practical usability. In this work, we propose ENSI, a novel non-interactive se…

@arXiv_csRO_bot@mastoxiv.page
2025-10-13 08:27:50

Adaptive Motion Planning via Contact-Based Intent Inference for Human-Robot Collaboration
Jiurun Song, Xiao Liang, Minghui Zheng
https://arxiv.org/abs/2510.08811 https://…

Adaptive Motion Planning via Contact-Based Intent Inference for Human-Robot Collaboration
Human-robot collaboration (HRC) requires robots to adapt their motions to human intent to ensure safe and efficient cooperation in shared spaces. Although large language models (LLMs) provide high-level reasoning for inferring human intent, their application to reliable motion planning in HRC remains challenging. Physical human-robot interaction (pHRI) is intuitive but often relies on continuous kinesthetic guidance, which imposes burdens on operators. To address these challenges, a contact-inf…

@arXiv_statML_bot@mastoxiv.page
2025-10-13 09:11:30

Conditional Flow Matching for Bayesian Posterior Inference
So Won Jeong, Percy S. Zhai, Veronika Ro\v{c}ov\'a
https://arxiv.org/abs/2510.09534 https://…

Conditional Flow Matching for Bayesian Posterior Inference
We propose a generative multivariate posterior sampler via flow matching. It offers a simple training objective, and does not require access to likelihood evaluation. The method learns a dynamic, block-triangular velocity field in the joint space of data and parameters, which results in a deterministic transport map from a source distribution to the desired posterior. The inverse map, named vector rank, is accessible by reversibly integrating the velocity over time. It is advantageous to levera…

@arXiv_mathLO_bot@mastoxiv.page
2025-10-13 07:53:00

The Fractal Logic of $\Phi$-adic Recursion
Milan Rosko
https://arxiv.org/abs/2510.08934 https://arxiv.org/pdf/2510.08934

The Fractal Logic of $Φ$-adic Recursion
We establish that valid $Σ_1$ propositional inference admits reduction to Fibonacci-indexed witness equations. Specifically, modus ponens verification reduces to solving a linear Diophantine equation in $O(M(\log n))$ time, where $M$ denotes integer multiplication complexity. This reduction is transitive: tautology verification proceeds via Fibonacci index arithmetic, bypassing semantic evaluation entirely. The core discovery is a transitive closure principle in $Φ$-scaled space (Hausdorff di…

@arXiv_csSE_bot@mastoxiv.page
2025-09-12 09:05:19

CLARA: A Developer's Companion for Code Comprehension and Analysis
Ahmed Adnan, Mushfiqur Rahman, Saad Sakib Noor, Kazi Sakib
https://arxiv.org/abs/2509.09072 https://

CLARA: A Developer's Companion for Code Comprehension and Analysis
Code comprehension and analysis of open-source project codebases is a task frequently performed by developers and researchers. However, existing tools that practitioners use for assistance with such tasks often require prior project setup, lack context-awareness, and involve significant manual effort. To address this, we present CLARA, a browser extension that utilizes a state-of-the-art inference model to assist developers and researchers in: (i) comprehending code files and code fragments, (i…

@arXiv_csHC_bot@mastoxiv.page
2025-09-12 09:49:49

The Impact of Device Type, Data Practices, and Use Case Scenarios on Privacy Concerns about Eye-tracked Augmented Reality in the United States and Germany
Efe Bozkir, Babette B\"uhler, Xiaoyuan Wu, Enkelejda Kasneci, Lujo Bauer, Lorrie Faith Cranor
https://arxiv.org/abs/2509.09285

The Impact of Device Type, Data Practices, and Use Case Scenarios on Privacy Concerns about Eye-tracked Augmented Reality in the United States and Germany
Augmented reality technology will likely be prevalent with more affordable head-mounted displays. Integrating novel interaction modalities such as eye trackers into head-mounted displays could lead to collecting vast amounts of biometric data, which may allow inference of sensitive user attributes like health status or sexual preference, posing privacy issues. While previous works broadly examined privacy concerns about augmented reality, ours is the first to extensively explore privacy concern…

@arXiv_csLO_bot@mastoxiv.page
2025-09-09 08:27:02

Compositional Inductive Invariant Inference via Assume-Guarantee Reasoning
Ian Dardik, Eunsuk Kang
https://arxiv.org/abs/2509.06250 https://arxiv.org/pdf/2…

Compositional Inductive Invariant Inference via Assume-Guarantee Reasoning
A common technique for verifying the safety of complex systems is the inductive invariant method. Inductive invariants are inductive formulas that overapproximate the reachable states of a system and imply a desired safety property. However, inductive invariants are notoriously complex, which makes inductive invariant inference a challenging problem. In this work, we observe that inductive invariant formulas are complex primarily because they must be closed over the transition relation of an en…

@arXiv_statME_bot@mastoxiv.page
2025-10-13 07:59:40

A Design-based Solution for Causal Inference with Text: Can a Language Model Be Too Large?
Graham Tierney, Srikar Katta, Christopher Bail, Sunshine Hillygus, Alexander Volfovsky
https://arxiv.org/abs/2510.08758

A Design-based Solution for Causal Inference with Text: Can a Language Model Be Too Large?
Many social science questions ask how linguistic properties causally affect an audience's attitudes and behaviors. Because text properties are often interlinked (e.g., angry reviews use profane language), we must control for possible latent confounding to isolate causal effects. Recent literature proposes adapting large language models (LLMs) to learn latent representations of text that successfully predict both treatment and the outcome. However, because the treatment is a component of the tex…

@arXiv_astrophGA_bot@mastoxiv.page
2025-09-10 08:36:51

LIMFAST. IV. Learning High-Redshift Galaxy Formation from Multiline Intensity Mapping with Implicit Likelihood Inference
Guochao Sun, Tri Nguyen, Claude-Andr\'e Faucher-Gigu\`ere, Adam Lidz, Tjitske Starkenburg, Bryan R. Scott, Tzu-Ching Chang, Steven R. Furlanetto
https://arxiv.org/abs/2509.07060

LIMFAST. IV. Learning High-Redshift Galaxy Formation from Multiline Intensity Mapping with Implicit Likelihood Inference
By opening up new avenues to statistically constrain astrophysics and cosmology with large-scale structure observations, the line intensity mapping (LIM) technique calls for novel tools for efficient forward modeling and inference. Implicit likelihood inference (ILI) from semi-numerical simulations provides a powerful setup for investigating a large model parameter space in a data-driven manner, therefore gaining significant recent attention. Using simulations of high-redshift 158$μ$m [CII] an…

@arXiv_qfinRM_bot@mastoxiv.page
2025-09-11 08:14:03

Chaotic Bayesian Inference: Strange Attractors as Risk Models for Black Swan Events
Crystal Rust
https://arxiv.org/abs/2509.08183 https://arxiv.org/pdf/250…

Chaotic Bayesian Inference: Strange Attractors as Risk Models for Black Swan Events
We introduce a new risk modeling framework where chaotic attractors shape the geometry of Bayesian inference. By combining heavy-tailed priors with Lorenz and Rossler dynamics, the models naturally generate volatility clustering, fat tails, and extreme events. We compare two complementary approaches: Model A, which emphasizes geometric stability, and Model B, which highlights rare bursts using Fibonacci diagnostics. Together, they provide a dual perspective for systemic risk analysis, linking B…

@arXiv_qbioTO_bot@mastoxiv.page
2025-10-13 08:27:40

Unsupervised full-field Bayesian inference of orthotropic hyperelasticity from a single biaxial test: a myocardial case study
Rogier P. Krijnen, Akshay Joshi, Siddhant Kumar, Mathias Peirlinck
https://arxiv.org/abs/2510.09498

Unsupervised full-field Bayesian inference of orthotropic hyperelasticity from a single biaxial test: a myocardial case study
Fully capturing this behavior in traditional homogenized tissue testing requires the excitation of multiple deformation modes, i.e. combined triaxial shear tests and biaxial stretch tests. Inherently, such multimodal experimental protocols necessitate multiple tissue samples and extensive sample manipulations. Intrinsic inter-sample variability and manipulation-induced tissue damage might have an adverse effect on the inversely identified tissue behavior. In this work, we aim to overcome this g…

@arXiv_csCL_bot@mastoxiv.page
2025-10-13 10:29:50

Mask Tokens as Prophet: Fine-Grained Cache Eviction for Efficient dLLM Inference
Jianuo Huang, Yaojie Zhang, Yicun Yang, Benhao Huang, Biqing Qi, Dongrui Liu, Linfeng Zhang
https://arxiv.org/abs/2510.09309

Mask Tokens as Prophet: Fine-Grained Cache Eviction for Efficient dLLM Inference
Diffusion large language models (dLLMs) present a promising alternative to dominant autoregressive models (ARMs) by the ability of parallel decoding at the expense of substantial computation and memory costs. Specifically, the cache mechanism for bidirectional attention in dLLMs demands large memory footprint, restricting their ability to handle long contexts under resource-limited settings. Existing cache eviction strategies are designed for ARMs and ignore the unique characteristics of dLLMs,…

@arXiv_csCR_bot@mastoxiv.page
2025-08-13 07:57:52

Selective KV-Cache Sharing to Mitigate Timing Side-Channels in LLM Inference
Kexin Chu, Zecheng Lin, Dawei Xiang, Zixu Shen, Jianchang Su, Cheng Chu, Yiwei Yang, Wenhui Zhang, Wenfei Wu, Wei Zhang
https://arxiv.org/abs/2508.08438

Selective KV-Cache Sharing to Mitigate Timing Side-Channels in LLM Inference
Global KV-cache sharing has emerged as a key optimization for accelerating large language model (LLM) inference. However, it exposes a new class of timing side-channel attacks, enabling adversaries to infer sensitive user inputs via shared cache entries. Existing defenses, such as per-user isolation, eliminate leakage but degrade performance by up to 38.9% in time-to-first-token (TTFT), making them impractical for high-throughput deployment. To address this gap, we introduce SafeKV (Secure and …

@arXiv_csIR_bot@mastoxiv.page
2025-08-13 09:24:32

SPARC: Soft Probabilistic Adaptive multi-interest Retrieval Model via Codebooks for recommender system
Jialiang Shi, Yaguang Dou, Tian Qi
https://arxiv.org/abs/2508.09090 https:…

SPARC: Soft Probabilistic Adaptive multi-interest Retrieval Model via Codebooks for recommender system
Modeling multi-interests has arisen as a core problem in real-world RS. Current multi-interest retrieval methods pose three major challenges: 1) Interests, typically extracted from predefined external knowledge, are invariant. Failed to dynamically evolve with users' real-time consumption preferences. 2) Online inference typically employs an over-exploited strategy, mainly matching users' existing interests, lacking proactive exploration and discovery of novel and long-tail interests. To addres…

@arXiv_statCO_bot@mastoxiv.page
2025-10-13 08:57:20

Bayesian Model Inference using Bayesian Quadrature: the Art of Acquisition Functions and Beyond
Jingwen Song, Pengfei Wei
https://arxiv.org/abs/2510.08974 https://

Bayesian Model Inference using Bayesian Quadrature: the Art of Acquisition Functions and Beyond
Estimating posteriors and the associated model evidences is a core issue of Bayesian model inference, and can be of great challenge given complex features of the posteriors such as multi-modalities of unequal importance, nonlinear dependencies and high sharpness. Bayesian Quadrature (BQ) has emerged as a competitive framework for tackling this challenge, as it provides flexible balance between computational cost and accuracy. The performance of a BQ scheme is fundamentally dictated by the acqui…

@arXiv_astrophHE_bot@mastoxiv.page
2025-09-10 09:00:31

When (not) to trust Monte Carlo approximations for hierarchical Bayesian inference
Jack Heinzel, Salvatore Vitale
https://arxiv.org/abs/2509.07221 https://…

When (not) to trust Monte Carlo approximations for hierarchical Bayesian inference
The coming years of gravitational wave astrophysics promises thousands of new detections, which can unlock fundamental scientific insights if the information in each observation can be properly synthesized into a coherent picture. State-of-the-art approaches often accomplish this with hierarchical Bayesian inference. However, this typically relies on Monte Carlo approximations that are already very expensive in current data, and may become prohibitively so in the future. In this paper we show h…

@arXiv_csLG_bot@mastoxiv.page
2025-09-12 09:53:19

Adaptive Pareto-Optimal Token Merging for Edge Transformer Models in Semantic Communication
Omar Erak, Omar Alhussein, Hatem Abou-Zeid, Mehdi Bennis
https://arxiv.org/abs/2509.09168

Adaptive Pareto-Optimal Token Merging for Edge Transformer Models in Semantic Communication
Large-scale transformer models have emerged as a powerful tool for semantic communication systems, enabling edge devices to extract rich representations for robust inference across noisy wireless channels. However, their substantial computational demands remain a major barrier to practical deployment in resource-constrained 6G networks. In this paper, we present a training-free framework for adaptive token merging in pretrained vision transformers to jointly reduce inference time and transmissi…

@arXiv_econEM_bot@mastoxiv.page
2025-09-11 09:00:03

Posterior inference of attitude-behaviour relationships using latent class choice models
Akshay Vij, Stephane Hess
https://arxiv.org/abs/2509.08373 https://

Posterior inference of attitude-behaviour relationships using latent class choice models
The link between attitudes and behaviour has been a key topic in choice modelling for two decades, with the widespread application of ever more complex hybrid choice models. This paper proposes a flexible and transparent alternative framework for empirically examining the relationship between attitudes and behaviours using latent class choice models (LCCMs). Rather than embedding attitudinal constructs within the structural model, as in hybrid choice frameworks, we recover class-specific attitu…

@arXiv_condmatstatmech_bot@mastoxiv.page
2025-10-13 08:57:30

Restoring detailed balance in non-Hermitian Markov processes
Tim Van Wesemael, Gilberto Nakamura, Jan Baetens, Odemir M. Bruno, Alexandre S. Martinez, Christophe Deroulers
https://arxiv.org/abs/2510.09467

Restoring detailed balance in non-Hermitian Markov processes
Stochastic processes out-of-equilibrium often involve asymmetric contributions that break detailed balance and lead to non-monotonic entropy production, limiting thermodynamic interpretations and inference techniques. Here we use Dyson maps to restore monotonic entropy growth in those processes, allowing the use of standard tools from statistical physics, providing a general and computationally tractable method applicable to a broad class of Markovian systems.

@arXiv_csAI_bot@mastoxiv.page
2025-09-12 09:38:09

Boosting Embodied AI Agents through Perception-Generation Disaggregation and Asynchronous Pipeline Execution
Shulai Zhang, Ao Xu, Quan Chen, Han Zhao, Weihao Cui, Ningxin Zheng, Haibin Lin, Xin Liu, Minyi Guo
https://arxiv.org/abs/2509.09560

Boosting Embodied AI Agents through Perception-Generation Disaggregation and Asynchronous Pipeline Execution
Embodied AI systems operate in dynamic environments, requiring seamless integration of perception and generation modules to process high-frequency input and output demands. Traditional sequential computation patterns, while effective in ensuring accuracy, face significant limitations in achieving the necessary "thinking" frequency for real-world applications. In this work, we present Auras, an algorithm-system co-designed inference framework to optimize the inference frequency of embodied AI ag…

@fanf@mendeddrum.org
2025-08-26 17:42:03

from my link log —
Type inference for plain data.
https://www.haskellforall.com/2025/08/type-inference-for-plain-data.html
saved 2025-08-13

Type inference for plain data
Type inference for plain data using Monoids The context behind this post is that my partner asked me how to ...

@arXiv_eessSP_bot@mastoxiv.page
2025-09-08 08:12:00

Communication-Efficient Collaborative LLM Inference via Distributed Speculative Decoding
Ce Zheng, Tingting Yang
https://arxiv.org/abs/2509.04576 https://a…

Communication-Efficient Collaborative LLM Inference via Distributed Speculative Decoding
Speculative decoding is an emerging technique that accelerates large language model (LLM) inference by allowing a smaller draft model to predict multiple tokens in advance, which are then verified or corrected by a larger target model. In AI-native radio access networks (AI-RAN), this paradigm is well-suited for collaborative inference between resource-constrained end devices and more capable edge servers or base stations (BSs). However, existing distributed speculative decoding requires transm…

@arXiv_csCV_bot@mastoxiv.page
2025-09-10 10:43:41

Active Membership Inference Test (aMINT): Enhancing Model Auditability with Multi-Task Learning
Daniel DeAlcala, Aythami Morales, Julian Fierrez, Gonzalo Mancera, Ruben Tolosana, Javier Ortega-Garcia
https://arxiv.org/abs/2509.07879

Active Membership Inference Test (aMINT): Enhancing Model Auditability with Multi-Task Learning
Active Membership Inference Test (aMINT) is a method designed to detect whether given data were used during the training of machine learning models. In Active MINT, we propose a novel multitask learning process that involves training simultaneously two models: the original or Audited Model, and a secondary model, referred to as the MINT Model, responsible for identifying the data used for training the Audited Model. This novel multi-task learning approach has been designed to incorporate the au…

@arXiv_csDC_bot@mastoxiv.page
2025-09-10 08:34:31

DuoServe-MoE: Dual-Phase Expert Prefetch and Cache Scheduling for Efficient MoE LLM Inference
Yuning Zhang, Grant Pinkert, Nan Yang, Yanli Li, Dong Yuan
https://arxiv.org/abs/2509.07379

DuoServe-MoE: Dual-Phase Expert Prefetch and Cache Scheduling for Efficient MoE LLM Inference
Large Language Models (LLMs) have demonstrated impressive performance across a wide range of deep learning tasks. Mixture of Experts (MoE) further enhances their capabilities by increasing model width through sparsely activated expert branches, which keeps inference computation efficient. However, the large number of expert weights introduces significant GPU memory pressure, especially in resource-constrained environments such as single-GPU servers. More importantly, MoE inference consists of t…

@arXiv_statME_bot@mastoxiv.page
2025-10-13 09:39:00

Defensive Model Expansion for Robust Bayesian Inference
Antonio R. Linero
https://arxiv.org/abs/2510.09598 https://arxiv.org/pdf/2510.09598

Defensive Model Expansion for Robust Bayesian Inference
Some applied researchers hesitate to use nonparametric methods, worrying that they will lose power in small samples or overfit the data when simpler models are sufficient. We argue that at least some of these concerns are unfounded when nonparametric models are strongly shrunk towards parametric submodels. We consider expanding a parametric model with a nonparametric component that is heavily shrunk toward zero. This construction allows the model to adapt automatically: if the parametric model …

@arXiv_csAR_bot@mastoxiv.page
2025-10-10 07:36:49

SPAD: Specialized Prefill and Decode Hardware for Disaggregated LLM Inference
Hengrui Zhang, Pratyush Patel, August Ning, David Wentzlaff
https://arxiv.org/abs/2510.08544 https:…

SPAD: Specialized Prefill and Decode Hardware for Disaggregated LLM Inference
Large Language Models (LLMs) have gained popularity in recent years, driving up the demand for inference. LLM inference is composed of two phases with distinct characteristics: a compute-bound prefill phase followed by a memory-bound decode phase. To efficiently serve LLMs, prior work proposes prefill-decode disaggregation to run each phase on separate hardware. However, existing hardware poorly matches the different requirements of each phase. Current datacenter GPUs and TPUs follow a more-is-…

@arXiv_csSE_bot@mastoxiv.page
2025-09-11 09:07:33

Handling Open-Vocabulary Constructs in Formalizing Specifications: Retrieval-Augmented Parsing with Expert Knowledge
Mohammad Saqib Hasan, Sayontan Ghosh, Dhruv Verma, Geoff Kuenning, Erez Zadok, Scott A. Smolka, Niranjan Balasubramanian
https://arxiv.org/abs/2509.08808

Handling Open-Vocabulary Constructs in Formalizing Specifications: Retrieval-Augmented Parsing with Expert Knowledge
We study the problem of Open-Vocabulary Constructs(OVCs) -- ones not known beforehand -- in the context of converting natural language (NL) specifications into formal languages (e.g., temporal logic or code). Models fare poorly on OVCs due to a lack of necessary knowledge a priori. In such situations, a domain expert can provide correct constructs at inference time based on their preferences or domain knowledge. Our goal is to effectively reuse this inference-time, expert-provided knowledge for…

@arXiv_csCV_bot@mastoxiv.page
2025-10-13 10:32:30

Utilizing dynamic sparsity on pretrained DETR
Reza Sedghi, Anand Subramoney, David Kappel
https://arxiv.org/abs/2510.09380 https://arxiv.org/pdf/2510.09380…

Utilizing dynamic sparsity on pretrained DETR
Efficient inference with transformer-based models remains a challenge, especially in vision tasks like object detection. We analyze the inherent sparsity in the MLP layers of DETR and introduce two methods to exploit it without retraining. First, we propose Static Indicator-Based Sparsification (SIBS), a heuristic method that predicts neuron inactivity based on fixed activation patterns. While simple, SIBS offers limited gains due to the input-dependent nature of sparsity. To address this, we i…

@arXiv_statME_bot@mastoxiv.page
2025-10-13 09:37:40

Uncertainty Quantification for Multi-level Models Using the Survey-Weighted Pseudo-Posterior
Matthew R. Williams, F. Hunter McGuire, Terrance D. Savitsky
https://arxiv.org/abs/2510.09401

Uncertainty Quantification for Multi-level Models Using the Survey-Weighted Pseudo-Posterior
Parameter estimation and inference from complex survey samples typically focuses on global model parameters whose estimators have asymptotic properties, such as from fixed effects regression models. We present a motivating example of Bayesian inference for a multi-level or mixed effects model in which both the local parameters (e.g. group level random effects) and the global parameters may need to be adjusted for the complex sampling design. We evaluate the limitations of the survey-weighted ps…

@arXiv_csSD_bot@mastoxiv.page
2025-09-12 08:45:59

DiFlow-TTS: Discrete Flow Matching with Factorized Speech Tokens for Low-Latency Zero-Shot Text-To-Speech
Ngoc-Son Nguyen, Hieu-Nghia Huynh-Nguyen, Thanh V. T. Tran, Truong-Son Hy, Van Nguyen
https://arxiv.org/abs/2509.09631

DiFlow-TTS: Discrete Flow Matching with Factorized Speech Tokens for Low-Latency Zero-Shot Text-To-Speech
Zero-shot Text-to-Speech (TTS) aims to synthesize high-quality speech that mimics the voice of an unseen speaker using only a short reference sample, requiring not only speaker adaptation but also accurate modeling of prosodic attributes. Recent approaches based on language models, diffusion, and flow matching have shown promising results in zero-shot TTS, but still suffer from slow inference and repetition artifacts. Discrete codec representations have been widely adopted for speech synthesis,…

@arXiv_csLO_bot@mastoxiv.page
2025-10-10 08:12:48

Dynamic Automated Deduction by Contradiction Separation: The Standard Extension Algorithm
Yang Xu, Xingxing He, Shuwei Chen, Jun Liu, Xiaomei Zhong
https://arxiv.org/abs/2510.08468

Dynamic Automated Deduction by Contradiction Separation: The Standard Extension Algorithm
Automated deduction seeks to enable machines to reason with mathematical precision and logical completeness. Classical resolution-based systems, such as Prover9, E, and Vampire, rely on binary inference, which inherently limits multi-clause synergy during proof search. The Contradiction Separation Extension (CSE) framework, introduced by Xu et al. (2018), overcame this theoretical limitation by extending deduction beyond binary inference. However, the original work did not specify how contradic…

@arXiv_statML_bot@mastoxiv.page
2025-10-10 09:22:29

Stick-Breaking Mixture Normalizing Flows with Component-Wise Tail Adaptation for Variational Inference
Seungsu Han, Juyoung Hwang, Won Chang
https://arxiv.org/abs/2510.07965 htt…

Stick-Breaking Mixture Normalizing Flows with Component-Wise Tail Adaptation for Variational Inference
Normalizing flows with a Gaussian base provide a computationally efficient way to approximate posterior distributions in Bayesian inference, but they often struggle to capture complex posteriors with multimodality and heavy tails. We propose a stick-breaking mixture base with component-wise tail adaptation (StiCTAF) for posterior approximation. The method first learns a flexible mixture base to mitigate the mode-seeking bias of reverse KL divergence through a weighted average of component-wise …

@arXiv_astrophCO_bot@mastoxiv.page
2025-09-11 07:57:52

Taking the Weight Off: Mitigating Parameter Bias from Catastrophic Outliers in 3$\times$2pt Analysis
Carolyn McDonald Mill, C. Danielle Leonard, Markus Michael Rau, Cora Uhlemann, Shahab Joudaki
https://arxiv.org/abs/2509.08052

Taking the Weight Off: Mitigating Parameter Bias from Catastrophic Outliers in 3$\times$2pt Analysis
Stage IV cosmological surveys will map the universe with unprecedented precision, reducing statistical uncertainties to levels where unmodelled systematics can significantly bias inference. In particular, photometric redshift (photo-z) errors and intrinsic alignments (IA) must be robustly accounted for to ensure accurate inference of cosmological parameters. The increasing depth of Stage IV surveys exacerbates these challenges by producing low signal-to-noise galaxy populations prone to inaccur…

@arXiv_csAR_bot@mastoxiv.page
2025-09-09 07:31:41

High Utilization Energy-Aware Real-Time Inference Deep Convolutional Neural Network Accelerator
Kuan-Ting Lin, Ching-Te Chiu, Jheng-Yi Chang, Shi-Zong Huang, Yu-Ting Li
https://arxiv.org/abs/2509.05688

High Utilization Energy-Aware Real-Time Inference Deep Convolutional Neural Network Accelerator
Deep convolution Neural Network (DCNN) has been widely used in computer vision tasks. However, for edge devices even inference has too large computational complexity and data access amount. The inference latency of state-of-the-art models are impractical for real-world applications. In this paper, we propose a high utilization energy-aware real-time inference deep convolutional neural network accelerator, which improves the performance of the current accelerators. First, we use the 1x1 size con…

@arXiv_csLG_bot@mastoxiv.page
2025-10-09 10:50:21

A Multi-Agent Framework for Stateful Inference-Time Search
Arshika Lalan, Rajat Ghosh, Aditya Kolsur, Debojyoti Dutta
https://arxiv.org/abs/2510.07147 https://

A Multi-Agent Framework for Stateful Inference-Time Search
Recent work explores agentic inference-time techniques to perform structured, multi-step reasoning. However, stateless inference often struggles on multi-step tasks due to the absence of persistent state. Moreover, task-specific fine-tuning or instruction-tuning often achieve surface-level code generation but remain brittle on tasks requiring deeper reasoning and long-horizon dependencies. To address these limitations, we propose stateful multi-agent evolutionary search, a training-free framewo…

@arXiv_csRO_bot@mastoxiv.page
2025-09-11 08:50:03

SVN-ICP: Uncertainty Estimation of ICP-based LiDAR Odometry using Stein Variational Newton
Shiping Ma, Haoming Zhang, Marc Toussaint
https://arxiv.org/abs/2509.08069 https://

SVN-ICP: Uncertainty Estimation of ICP-based LiDAR Odometry using Stein Variational Newton
This letter introduces SVN-ICP, a novel Iterative Closest Point (ICP) algorithm with uncertainty estimation that leverages Stein Variational Newton (SVN) on manifold. Designed specifically for fusing LiDAR odometry in multisensor systems, the proposed method ensures accurate pose estimation and consistent noise parameter inference, even in LiDAR-degraded environments. By approximating the posterior distribution using particles within the Stein Variational Inference framework, SVN-ICP eliminates…

@arXiv_csAI_bot@mastoxiv.page
2025-09-11 07:41:02

Automatic Failure Attribution and Critical Step Prediction Method for Multi-Agent Systems Based on Causal Inference
Guoqing Ma, Jia Zhu, Hanghui Guo, Weijie Shi, Jiawei Shen, Jingjiang Liu, Yidan Liang
https://arxiv.org/abs/2509.08682

Automatic Failure Attribution and Critical Step Prediction Method for Multi-Agent Systems Based on Causal Inference
Multi-agent systems (MAS) are critical for automating complex tasks, yet their practical deployment is severely hampered by the challenge of failure attribution. Current diagnostic tools, which rely on statistical correlations, are fundamentally inadequate; on challenging benchmarks like Who\&When, state-of-the-art methods achieve less than 15\% accuracy in locating the root-cause step of a failure. To address this critical gap, we introduce the first failure attribution framework for MAS groun…

@arXiv_csCV_bot@mastoxiv.page
2025-09-12 10:08:09

Image Recognition with Vision and Language Embeddings of VLMs
Illia Volkov, Nikita Kisel, Klara Janouskova, Jiri Matas
https://arxiv.org/abs/2509.09311 https://

Image Recognition with Vision and Language Embeddings of VLMs
Vision-language models (VLMs) have enabled strong zero-shot classification through image-text alignment. Yet, their purely visual inference capabilities remain under-explored. In this work, we conduct a comprehensive evaluation of both language-guided and vision-only image classification with a diverse set of dual-encoder VLMs, including both well-established and recent models such as SigLIP 2 and RADIOv2.5. The performance is compared in a standard setup on the ImageNet-1k validation set and i…

@arXiv_csDC_bot@mastoxiv.page
2025-08-13 12:09:43

Replaced article(s) found for cs.DC. https://arxiv.org/list/cs.DC/new
[1/1]:
- Keep Your Friends Close: Leveraging Affinity Groups to Accelerate AI Inference Workflows
Thiago Garrett, Weijia Song, Roman Vitenberg, Ken Birman

@arXiv_csCL_bot@mastoxiv.page
2025-08-13 10:10:42

Retrospective Sparse Attention for Efficient Long-Context Generation
Seonghwan Choi, Beomseok Kang, Dongwon Jo, Jae-Joon Kim
https://arxiv.org/abs/2508.09001 https://

Retrospective Sparse Attention for Efficient Long-Context Generation
Large Language Models (LLMs) are increasingly deployed in long-context tasks such as reasoning, code generation, and multi-turn dialogue. However, inference over extended contexts is bottlenecked by the Key-Value (KV) cache, whose memory footprint grows linearly with sequence length and dominates latency at each decoding step. While recent KV cache compression methods identify and load important tokens, they focus predominantly on input contexts and fail to address the cumulative attention erro…

@arXiv_csAI_bot@mastoxiv.page
2025-09-12 07:30:19

An Interval Type-2 Version of Bayes Theorem Derived from Interval Probability Range Estimates Provided by Subject Matter Experts
John T. Rickard, William A. Dembski, James Rickards
https://arxiv.org/abs/2509.08834

An Interval Type-2 Version of Bayes Theorem Derived from Interval Probability Range Estimates Provided by Subject Matter Experts
Bayesian inference is widely used in many different fields to test hypotheses against observations. In most such applications, an assumption is made of precise input values to produce a precise output value. However, this is unrealistic for real-world applications. Often the best available information from subject matter experts (SMEs) in a given field is interval range estimates of the input probabilities involved in Bayes Theorem. This paper provides two key contributions to extend Bayes Theo…

@arXiv_csCV_bot@mastoxiv.page
2025-10-13 10:33:00

Dynamic Weight-based Temporal Aggregation for Low-light Video Enhancement
Ruirui Lin, Guoxi Huang, Nantheera Anantrasirichai
https://arxiv.org/abs/2510.09450 https://

Dynamic Weight-based Temporal Aggregation for Low-light Video Enhancement
Low-light video enhancement (LLVE) is challenging due to noise, low contrast, and color degradations. Learning-based approaches offer fast inference but still struggle with heavy noise in real low-light scenes, primarily due to limitations in effectively leveraging temporal information. In this paper, we address this issue with DWTA-Net, a novel two-stage framework that jointly exploits short- and long-term temporal cues. Stage I employs Visual State-Space blocks for multi-frame alignment, reco…

@arXiv_astrophCO_bot@mastoxiv.page
2025-10-13 09:21:50

Cosmology Likelihood for Observables in \Euclid (CLOE). 1. Theoretical recipe
Collaboration, Cardone, Joudaki, Blot, Bonici, Camera, Ca\~nas-Herrera, Carrilho, Casas, Davini, Di Domizio, Farrens, Goh, Beauchamps, Ili\'c, Keil, Le Brun, Martinelli, Moretti, Pettorino, Pezzotta, S\'anchez, Sakr, Sciotti, Tanidis, Tutusaus, Ajani, Crocce, Giocoli, Legrand, Lembo, Lesci, Girones, Nouri-Zonoz, Pamuk, Tsedrik, Bel, Carbone, Duncan, Kilbinger, Lacasa, Lattanzi, Sapone, Sellentin, Tayl…

Cosmology Likelihood for Observables in \Euclid (CLOE). 1. Theoretical recipe
As the statistical precision of cosmological measurements increases, the accuracy of the theoretical description of these measurements needs to increase correspondingly in order to infer the underlying cosmology that governs the Universe. To this end, we have created the Cosmology Likelihood for Observables in Euclid (CLOE), which is a novel cosmological parameter inference pipeline developed within the Euclid Consortium to translate measurements and covariances into cosmological parameter cons…

@arXiv_csCR_bot@mastoxiv.page
2025-09-09 12:06:22

Imitative Membership Inference Attack
Yuntao Du, Yuetian Chen, Hanshen Xiao, Bruno Ribeiro, Ninghui Li
https://arxiv.org/abs/2509.06796 https://arxiv.org/p…

Imitative Membership Inference Attack
A Membership Inference Attack (MIA) assesses how much a target machine learning model reveals about its training data by determining whether specific query instances were part of the training set. State-of-the-art MIAs rely on training hundreds of shadow models that are independent of the target model, leading to significant computational overhead. In this paper, we introduce Imitative Membership Inference Attack (IMIA), which employs a novel imitative training technique to strategically constr…

@arXiv_csLG_bot@mastoxiv.page
2025-10-13 10:41:10

Safety Game: Balancing Safe and Informative Conversations with Blackbox Agentic AI using LP Solvers
Tuan Nguyen, Long Tran-Thanh
https://arxiv.org/abs/2510.09330 https://…

Safety Game: Balancing Safe and Informative Conversations with Blackbox Agentic AI using LP Solvers
Ensuring that large language models (LLMs) comply with safety requirements is a central challenge in AI deployment. Existing alignment approaches primarily operate during training, such as through fine-tuning or reinforcement learning from human feedback, but these methods are costly and inflexible, requiring retraining whenever new requirements arise. Recent efforts toward inference-time alignment mitigate some of these limitations but still assume access to model internals, which is impractic…

@arXiv_csCL_bot@mastoxiv.page
2025-09-12 09:40:19

CCF: A Context Compression Framework for Efficient Long-Sequence Language Modeling
Wenhao Li, Bangcheng Sun, Weihao Ye, Tianyi Zhang, Daohai Yu, Fei Chao, Rongrong Ji
https://arxiv.org/abs/2509.09199

CCF: A Context Compression Framework for Efficient Long-Sequence Language Modeling
Scaling language models to longer contexts is essential for capturing rich dependencies across extended discourse. However, naïve context extension imposes significant computational and memory burdens, often resulting in inefficiencies during both training and inference. In this work, we propose CCF, a novel context compression framework designed to enable efficient long-context modeling by learning hierarchical latent representations that preserve global semantics while aggressively reducing …

@arXiv_csAR_bot@mastoxiv.page
2025-09-11 08:37:53

BitROM: Weight Reload-Free CiROM Architecture Towards Billion-Parameter 1.58-bit LLM Inference
Wenlun Zhang, Xinyu Li, Shimpei Ando, Kentaro Yoshioka
https://arxiv.org/abs/2509.08542

BitROM: Weight Reload-Free CiROM Architecture Towards Billion-Parameter 1.58-bit LLM Inference
Compute-in-Read-Only-Memory (CiROM) accelerators offer outstanding energy efficiency for CNNs by eliminating runtime weight updates. However, their scalability to Large Language Models (LLMs) is fundamentally constrained by their vast parameter sizes. Notably, LLaMA-7B - the smallest model in LLaMA series - demands more than 1,000 cm2 of silicon area even in advanced CMOS nodes. This paper presents BitROM, the first CiROM-based accelerator that overcomes this limitation through co-design with B…

@arXiv_statML_bot@mastoxiv.page
2025-09-09 09:06:22

MOSAIC: Minimax-Optimal Sparsity-Adaptive Inference for Change Points in Dynamic Networks
Yingying Fan, Jingyuan Liu, Jinchi Lv, Ao Sun
https://arxiv.org/abs/2509.06303 https://…

MOSAIC: Minimax-Optimal Sparsity-Adaptive Inference for Change Points in Dynamic Networks
We propose a new inference framework, named MOSAIC, for change-point detection in dynamic networks with the simultaneous low-rank and sparse-change structure. We establish the minimax rate of detection boundary, which relies on the sparsity of changes. We then develop an eigen-decomposition-based test with screened signals that approaches the minimax rate in theory, with only a minor logarithmic loss. For practical implementation of MOSAIC, we adjust the theoretical test by a novel residual-bas…

@arXiv_csCV_bot@mastoxiv.page
2025-10-13 10:27:10

RadioFlow: Efficient Radio Map Construction Framework with Flow Matching
Haozhe Jia, Wenshuo Chen, Xiucheng Wang, Nan Cheng, Hongbo Zhang, Kuimou Yu, Songning Lai, Nanjian Jia, Bowen Tian, Hongru Xiao, Yutao Yue
https://arxiv.org/abs/2510.09314

RadioFlow: Efficient Radio Map Construction Framework with Flow Matching
Accurate and real-time radio map (RM) generation is crucial for next-generation wireless systems, yet diffusion-based approaches often suffer from large model sizes, slow iterative denoising, and high inference latency, which hinder practical deployment. To overcome these limitations, we propose \textbf{RadioFlow}, a novel flow-matching-based generative framework that achieves high-fidelity RM generation through single-step efficient sampling. Unlike conventional diffusion models, RadioFlow lea…

@arXiv_csAI_bot@mastoxiv.page
2025-10-13 09:47:40

OSCAR: Orthogonal Stochastic Control for Alignment-Respecting Diversity in Flow Matching
Jingxuan Wu, Zhenglin Wan, Xingrui Yu, Yuzhe Yang, Bo An, Ivor Tsang
https://arxiv.org/abs/2510.09060

OSCAR: Orthogonal Stochastic Control for Alignment-Respecting Diversity in Flow Matching
Flow-based text-to-image models follow deterministic trajectories, forcing users to repeatedly sample to discover diverse modes, which is a costly and inefficient process. We present a training-free, inference-time control mechanism that makes the flow itself diversity-aware. Our method simultaneously encourages lateral spread among trajectories via a feature-space objective and reintroduces uncertainty through a time-scheduled stochastic perturbation. Crucially, this perturbation is projected …

@arXiv_csCL_bot@mastoxiv.page
2025-10-13 10:43:40

Prompting Test-Time Scaling Is A Strong LLM Reasoning Data Augmentation
Sondos Mahmoud Bsharat, Zhiqiang Shen
https://arxiv.org/abs/2510.09599 https://arxi…

Prompting Test-Time Scaling Is A Strong LLM Reasoning Data Augmentation
Large language models (LLMs) have demonstrated impressive reasoning capabilities when provided with chain-of-thought exemplars, but curating large reasoning datasets remains laborious and resource-intensive. In this work, we introduce Prompting Test-Time Scaling (P-TTS), a simple yet effective inference-time data augmentation strategy for enhancing LLM reasoning through finetuning. Rather than collecting thousands or even millions of examples, P-TTS leverages a small pool of only 90 manually se…

@arXiv_csCR_bot@mastoxiv.page
2025-10-10 08:34:39

Comparison of Fully Homomorphic Encryption and Garbled Circuit Techniques in Privacy-Preserving Machine Learning Inference
Kalyan Cheerla (University of North Texas), Lotfi Ben Othmane (University of North Texas), Kirill Morozov (University of North Texas)
https://arxiv.org/abs/2510.07457

Comparison of Fully Homomorphic Encryption and Garbled Circuit Techniques in Privacy-Preserving Machine Learning Inference
Machine Learning (ML) is making its way into fields such as healthcare, finance, and Natural Language Processing (NLP), and concerns over data privacy and model confidentiality continue to grow. Privacy-preserving Machine Learning (PPML) addresses this challenge by enabling inference on private data without revealing sensitive inputs or proprietary models. Leveraging Secure Computation techniques from Cryptography, two widely studied approaches in this domain are Fully Homomorphic Encryption (F…

@arXiv_csRO_bot@mastoxiv.page
2025-10-07 11:45:42

HyperVLA: Efficient Inference in Vision-Language-Action Models via Hypernetworks
Zheng Xiong, Kang Li, Zilin Wang, Matthew Jackson, Jakob Foerster, Shimon Whiteson
https://arxiv.org/abs/2510.04898

HyperVLA: Efficient Inference in Vision-Language-Action Models via Hypernetworks
Built upon language and vision foundation models with strong generalization ability and trained on large-scale robotic data, Vision-Language-Action (VLA) models have recently emerged as a promising approach to learning generalist robotic policies. However, a key drawback of existing VLAs is their extremely high inference costs. In this paper, we propose HyperVLA to address this problem. Unlike existing monolithic VLAs that activate the whole model during both training and inference, HyperVLA us…

@arXiv_statME_bot@mastoxiv.page
2025-08-13 08:42:02

Doubly robust pointwise confidence intervals for a monotonic continuous treatment effect curve
Charles R. Doss
https://arxiv.org/abs/2508.08415 https://arx…

Doubly robust pointwise confidence intervals for a monotonic continuous treatment effect curve
We study nonparametric inference for the causal dose-response (or treatment effect) curve when the treatment variable is continuous rather than binary or discrete. We do this by developing doubly robust confidence intervals for the continuous treatment effect curve (at a fixed point) under the assumption that it is monotonic, based on inverting a likelihood ratio-type test. Monotonicity of the treatment effect curve is often a very natural assumption, and this assumption removes the need to cho…

@arXiv_csAI_bot@mastoxiv.page
2025-10-13 09:54:30

PAC Reasoning: Controlling the Performance Loss for Efficient Reasoning
Hao Zeng, Jianguo Huang, Bingyi Jing, Hongxin Wei, Bo An
https://arxiv.org/abs/2510.09133 https://…

PAC Reasoning: Controlling the Performance Loss for Efficient Reasoning
Large reasoning models (LRMs) have achieved remarkable progress in complex problem-solving tasks. Despite this success, LRMs typically suffer from high computational costs during deployment, highlighting a need for efficient inference. A popular direction of efficiency improvement is to switch the LRM between thinking and nonthinking modes dynamically. However, such approaches often introduce additional reasoning errors and lack statistical guarantees for the performance loss, which are critica…

@arXiv_csCV_bot@mastoxiv.page
2025-08-13 10:21:12

Turbo-VAED: Fast and Stable Transfer of Video-VAEs to Mobile Devices
Ya Zou, Jingfeng Yao, Siyuan Yu, Shuai Zhang, Wenyu Liu, Xinggang Wang
https://arxiv.org/abs/2508.09136 http…

Turbo-VAED: Fast and Stable Transfer of Video-VAEs to Mobile Devices
There is a growing demand for deploying large generative AI models on mobile devices. For recent popular video generative models, however, the Variational AutoEncoder (VAE) represents one of the major computational bottlenecks. Both large parameter sizes and mismatched kernels cause out-of-memory errors or extremely slow inference on mobile devices. To address this, we propose a low-cost solution that efficiently transfers widely used video VAEs to mobile devices. (1) We analyze redundancy in e…

@arXiv_statML_bot@mastoxiv.page
2025-09-09 08:54:51

Fisher Random Walk: Automatic Debiasing Contextual Preference Inference for Large Language Model Evaluation
Yichi Zhang, Alexander Belloni, Ethan X. Fang, Junwei Lu, Xiaoan Xu
https://arxiv.org/abs/2509.05852

Fisher Random Walk: Automatic Debiasing Contextual Preference Inference for Large Language Model Evaluation
Motivated by the need for rigorous and scalable evaluation of large language models, we study contextual preference inference for pairwise comparison functionals of context-dependent preference score functions across domains. Focusing on the contextual Bradley-Terry-Luce model, we develop a semiparametric efficient estimator that automates the debiased estimation through aggregating weighted residual balancing terms across the comparison graph. We show that the efficiency is achieved when the w…

@arXiv_csLG_bot@mastoxiv.page
2025-10-10 11:05:29

Mix- and MoE-DPO: A Variational Inference Approach to Direct Preference Optimization
Jason Bohne, Pawel Polak, David Rosenberg, Brian Bloniarz, Gary Kazantsev
https://arxiv.org/abs/2510.08256

Mix- and MoE-DPO: A Variational Inference Approach to Direct Preference Optimization
Direct Preference Optimization (DPO) has recently emerged as a simple and effective alternative to reinforcement learning from human feedback (RLHF) for aligning large language models (LLMs) with user preferences. However, existing DPO formulations rely on a single monolithic model, which limits their expressivity in multi-task settings and their adaptability to heterogeneous or diverse preference distributions. In this work, we propose Mix- and MoE-DPO, a framework that extends DPO with both s…

@arXiv_csCL_bot@mastoxiv.page
2025-10-13 10:36:50

Hybrid Models for Natural Language Reasoning: The Case of Syllogistic Logic
Manuel Vargas Guzm\'an, Jakub Szymanik, Maciej Malicki
https://arxiv.org/abs/2510.09472 https://

Hybrid Models for Natural Language Reasoning: The Case of Syllogistic Logic
Despite the remarkable progress in neural models, their ability to generalize, a cornerstone for applications like logical reasoning, remains a critical challenge. We delineate two fundamental aspects of this ability: compositionality, the capacity to abstract atomic logical rules underlying complex inferences, and recursiveness, the aptitude to build intricate representations through iterative application of inference rules. In the literature, these two aspects are often confounded together un…

@arXiv_csCV_bot@mastoxiv.page
2025-09-11 07:58:33

Video Parallel Scaling: Aggregating Diverse Frame Subsets for VideoLLMs
Hyungjin Chung, Hyelin Nam, Jiyeon Kim, Hyojun Go, Byeongjun Park, Junho Kim, Joonseok Lee, Seongsu Ha, Byung-Hoon Kim
https://arxiv.org/abs/2509.08016

Video Parallel Scaling: Aggregating Diverse Frame Subsets for VideoLLMs
Video Large Language Models (VideoLLMs) face a critical bottleneck: increasing the number of input frames to capture fine-grained temporal detail leads to prohibitive computational costs and performance degradation from long context lengths. We introduce Video Parallel Scaling (VPS), an inference-time method that expands a model's perceptual bandwidth without increasing its context window. VPS operates by running multiple parallel inference streams, each processing a unique, disjoint subset of …

@arXiv_csAI_bot@mastoxiv.page
2025-10-13 10:02:20

Localist LLMs -- A Mathematical Framework for Dynamic Locality Control
Joachim Diederich
https://arxiv.org/abs/2510.09338 https://arxiv.org/pdf/2510.09338

Localist LLMs -- A Mathematical Framework for Dynamic Locality Control
We present a novel framework for training large language models with continuously adjustable internal representations that span the full spectrum from localist (interpretable, rule-based) to distributed (generalizable, efficient) encodings. The key innovation is a locality dial, a tunable parameter that dynamically controls the degree of localization during both training and inference without requiring model retraining. This is achieved through group sparsity penalties on attention mechanisms, …

@arXiv_statME_bot@mastoxiv.page
2025-08-13 09:37:32

Sensitivity Analysis to Unobserved Confounding with Copula-based Normalizing Flows
Sourabh Balgi, Marc Braun, Jose M. Pe\~na, Adel Daoud
https://arxiv.org/abs/2508.08752 https:/…

Sensitivity Analysis to Unobserved Confounding with Copula-based Normalizing Flows
We propose a novel method for sensitivity analysis to unobserved confounding in causal inference. The method builds on a copula-based causal graphical normalizing flow that we term $ρ$-GNF, where $ρ\in [-1,+1]$ is the sensitivity parameter. The parameter represents the non-causal association between exposure and outcome due to unobserved confounding, which is modeled as a Gaussian copula. In other words, the $ρ$-GNF enables scholars to estimate the average causal effect (ACE) as a function o…

@arXiv_csCL_bot@mastoxiv.page
2025-09-12 10:00:49

Steering MoE LLMs via Expert (De)Activation
Mohsen Fayyaz, Ali Modarressi, Hanieh Deilamsalehy, Franck Dernoncourt, Ryan Rossi, Trung Bui, Hinrich Sch\"utze, Nanyun Peng
https://arxiv.org/abs/2509.09660

Steering MoE LLMs via Expert (De)Activation
Mixture-of-Experts (MoE) in Large Language Models (LLMs) routes each token through a subset of specialized Feed-Forward Networks (FFN), known as experts. We present SteerMoE, a framework for steering MoE models by detecting and controlling behavior-linked experts. Our detection method identifies experts with distinct activation patterns across paired inputs exhibiting contrasting behaviors. By selectively (de)activating such experts during inference, we control behaviors like faithfulness and s…

@arXiv_csCR_bot@mastoxiv.page
2025-10-08 09:39:59

Membership Inference Attacks on Tokenizers of Large Language Models
Meng Tong, Yuntao Du, Kejiang Chen, Weiming Zhang, Ninghui Li
https://arxiv.org/abs/2510.05699 https://

Membership Inference Attacks on Tokenizers of Large Language Models
Membership inference attacks (MIAs) are widely used to assess the privacy risks associated with machine learning models. However, when these attacks are applied to pre-trained large language models (LLMs), they encounter significant challenges, including mislabeled samples, distribution shifts, and discrepancies in model size between experimental and real-world settings. To address these limitations, we introduce tokenizers as a new attack vector for membership inference. Specifically, a tokeni…

@arXiv_csLG_bot@mastoxiv.page
2025-10-06 10:27:19

Best-of-Majority: Minimax-Optimal Strategy for Pass@$k$ Inference Scaling
Qiwei Di, Kaixuan Ji, Xuheng Li, Heyang Zhao, Quanquan Gu
https://arxiv.org/abs/2510.03199 https://

Best-of-Majority: Minimax-Optimal Strategy for Pass@$k$ Inference Scaling
LLM inference often generates a batch of candidates for a prompt and selects one via strategies like majority voting or Best-of- N (BoN). For difficult tasks, this single-shot selection often underperforms. Consequently, evaluations commonly report Pass@$k$: the agent may submit up to $k$ responses, and only the best of them is used when computing regret. Motivated by this, we study inference scaling in the more general Pass@$k$ inference setting, and prove that neither majority voting nor BoN …

@arXiv_csCV_bot@mastoxiv.page
2025-09-12 10:06:19

Modality-Agnostic Input Channels Enable Segmentation of Brain lesions in Multimodal MRI with Sequences Unavailable During Training
Anthony P. Addison, Felix Wagner, Wentian Xu, Natalie Voets, Konstantinos Kamnitsas
https://arxiv.org/abs/2509.09290

Modality-Agnostic Input Channels Enable Segmentation of Brain lesions in Multimodal MRI with Sequences Unavailable During Training
Segmentation models are important tools for the detection and analysis of lesions in brain MRI. Depending on the type of brain pathology that is imaged, MRI scanners can acquire multiple, different image modalities (contrasts). Most segmentation models for multimodal brain MRI are restricted to fixed modalities and cannot effectively process new ones at inference. Some models generalize to unseen modalities but may lose discriminative modality-specific information. This work aims to develop a m…

@arXiv_csAI_bot@mastoxiv.page
2025-10-13 09:31:10

Tiny-R1V: Lightweight Multimodal Unified Reasoning Model via Model Merging
Qixiang Yin, Huanjin Yao, Jianghao Chen, Jiaxing Huang, Zhicheng Zhao, Fei Su
https://arxiv.org/abs/2510.08987

Tiny-R1V: Lightweight Multimodal Unified Reasoning Model via Model Merging
Although Multimodal Large Language Models (MLLMs) have demonstrated remarkable capabilities across diverse tasks, they encounter numerous challenges in terms of reasoning efficiency, such as large model size, overthinking, and compromised accuracy in lightweight scenarios. However, research on the reasoning capabilities of lightweight MLLMs is quite lacking. To this end, we propose Tiny-R1V, a novel lightweight 3B model that achieves faster inference and higher accuracy via a two-stage optimiza…

@arXiv_statME_bot@mastoxiv.page
2025-09-11 08:45:33

Doubly robust average treatment effect estimation for survival data
Byeonghee Lee, Joonsung Kang
https://arxiv.org/abs/2509.08788 https://arxiv.org/pdf/250…

Doubly robust average treatment effect estimation for survival data
Considering censored outcomes in survival analysis can lead to quite complex results in the model setting of causal inference. Causal inference has attracted a lot of attention over the past few years, but little research has been done on survival analysis. Even for the only research conducted, the machine learning method was considered assuming a large sample, which is not suitable in that the actual data are high dimensional low sample size (HDLSS) method. Therefore, penalty is considered for…

@arXiv_csLG_bot@mastoxiv.page
2025-09-10 10:39:51

MoE-Compression: How the Compression Error of Experts Affects the Inference Accuracy of MoE Model?
Songkai Ma, Zhaorui Zhang, Sheng Di, Benben Liu, Xiaodong Yu, Xiaoyi Lu, Dan Wang
https://arxiv.org/abs/2509.07727

MoE-Compression: How the Compression Error of Experts Affects the Inference Accuracy of MoE Model?
With the widespread application of Mixture of Experts (MoE) reasoning models in the field of LLM learning, efficiently serving MoE models under limited GPU memory constraints has emerged as a significant challenge. Offloading the non-activated experts to main memory has been identified as an efficient approach to address such a problem, while it brings the challenges of transferring the expert between the GPU memory and main memory. We need to explore an efficient approach to compress the exper…

@arXiv_csCR_bot@mastoxiv.page
2025-09-09 11:48:22

DCMI: A Differential Calibration Membership Inference Attack Against Retrieval-Augmented Generation
Xinyu Gao, Xiangtao Meng, Yingkai Dong, Zheng Li, Shanqing Guo
https://arxiv.org/abs/2509.06026

DCMI: A Differential Calibration Membership Inference Attack Against Retrieval-Augmented Generation
While Retrieval-Augmented Generation (RAG) effectively reduces hallucinations by integrating external knowledge bases, it introduces vulnerabilities to membership inference attacks (MIAs), particularly in systems handling sensitive data. Existing MIAs targeting RAG's external databases often rely on model responses but ignore the interference of non-member-retrieved documents on RAG outputs, limiting their effectiveness. To address this, we propose DCMI, a differential calibration MIA that miti…

@arXiv_csLG_bot@mastoxiv.page
2025-10-10 11:06:29

Dynamic Features Adaptation in Networking: Toward Flexible training and Explainable inference
Yannis Belkhiter, Seshu Tirupathi, Giulio Zizzo, Merim Dzaferagic, John D. Kelleher
https://arxiv.org/abs/2510.08303

Dynamic Features Adaptation in Networking: Toward Flexible training and Explainable inference
As AI becomes a native component of 6G network control, AI models must adapt to continuously changing conditions, including the introduction of new features and measurements driven by multi-vendor deployments, hardware upgrades, and evolving service requirements. To address this growing need for flexible learning in non-stationary environments, this vision paper highlights Adaptive Random Forests (ARFs) as a reliable solution for dynamic feature adaptation in communication network scenarios. We…

@arXiv_statME_bot@mastoxiv.page
2025-09-09 10:21:32

Bayesian Inference for Confounding Variables and Limited Information
Ellis Scharfenaker, Duncan K. Foley
https://arxiv.org/abs/2509.05520 https://arxiv.org…

Bayesian Inference for Confounding Variables and Limited Information
A central challenge in statistical inference is the presence of confounding variables that may distort observed associations between treatment and outcome. Conventional "causal" methods, grounded in assumptions such as ignorability, exclude the possibility of unobserved confounders, leading to posterior inferences that overstate certainty. We develop a Bayesian framework that relaxes these assumptions by introducing entropy-favoring priors over hypothesis spaces that explicitly allow for latent…

@arXiv_csAI_bot@mastoxiv.page
2025-10-07 12:16:52

Staircase Streaming for Low-Latency Multi-Agent Inference
Junlin Wang (Zach), Jue Wang (Zach), Zhen (Zach), Xu, Ben Athiwaratkun, Bhuwan Dhingra, Ce Zhang, James Zou
https://arxiv.org/abs/2510.05059 …

Staircase Streaming for Low-Latency Multi-Agent Inference
Recent advances in large language models (LLMs) opened up new directions for leveraging the collective expertise of multiple LLMs. These methods, such as Mixture-of-Agents, typically employ additional inference steps to generate intermediate outputs, which are then used to produce the final response. While multi-agent inference can enhance response quality, it can significantly increase the time to first token (TTFT), posing a challenge for latency-sensitive applications and hurting user experi…

@arXiv_csLG_bot@mastoxiv.page
2025-10-08 10:49:49

Empirical Comparison of Membership Inference Attacks in Deep Transfer Learning
Yuxuan Bai, Gauri Pradhan, Marlon Tobaben, Antti Honkela
https://arxiv.org/abs/2510.05753 https://…

Empirical Comparison of Membership Inference Attacks in Deep Transfer Learning
With the emergence of powerful large-scale foundation models, the training paradigm is increasingly shifting from from-scratch training to transfer learning. This enables high utility training with small, domain-specific datasets typical in sensitive applications.Membership inference attacks (MIAs) provide an empirical estimate of the privacy leakage by machine learning models. Yet, prior assessments of MIAs against models fine-tuned with transfer learning rely on a small subset of possible att…

@arXiv_csLG_bot@mastoxiv.page
2025-10-08 10:43:39

(Token-Level) \textbf{InfoRMIA}: Stronger Membership Inference and Memorization Assessment for LLMs
Jiashu Tao, Reza Shokri
https://arxiv.org/abs/2510.05582 https://

(Token-Level) \textbf{InfoRMIA}: Stronger Membership Inference and Memorization Assessment for LLMs
Machine learning models are known to leak sensitive information, as they inevitably memorize (parts of) their training data. More alarmingly, large language models (LLMs) are now trained on nearly all available data, which amplifies the magnitude of information leakage and raises serious privacy risks. Hence, it is more crucial than ever to quantify privacy risk before the release of LLMs. The standard method to quantify privacy is via membership inference attacks, where the state-of-the-art ap…

Tootfinder

Opt-in global Mastodon full text search. Join the index!