SemiAnalysis launches InferenceMAX, an open-source benchmark that automatically tracks LLM inference performance across AI models and frameworks every night (Kimbo Chen/SemiAnalysis)
https://newsletter.semianalysis.com/p/inferencemax-open-source-inference
Towards Confidential and Efficient LLM Inference with Dual Privacy Protection
Honglan Yu, Yibin Wang, Feifei Dai, Dong Liu, Haihui Fan, Xiaoyan Gu
https://arxiv.org/abs/2509.09091
Zero-shot Structure Learning and Planning for Autonomous Robot Navigation using Active Inference
Daria de tinguy, Tim Verbelen, Emilio Gamba, Bart Dhoedt
https://arxiv.org/abs/2510.09574

Zero-shot Structure Learning and Planning for Autonomous Robot Navigation using Active Inference
Autonomous navigation in unfamiliar environments requires robots to simultaneously explore, localise, and plan under uncertainty, without relying on predefined maps or extensive training. We present a biologically inspired, Active Inference-based framework, Active Inference MAPping and Planning (AIMAPP). This model unifies mapping, localisation, and decision-making within a single generative model. Inspired by hippocampal navigation, it uses topological reasoning, place-cell encoding, and episo…
Fused Lasso Improves Accuracy of Co-occurrence Network Inference in Grouped Samples
Daniel Agyapong, Briana H. Beatty, Peter G. Kennedy, Toby D. Hocking
https://arxiv.org/abs/2509.09413
READER: Retrieval-Assisted Drafter for Efficient LLM Inference
Maxim Divilkovskiy, Vitaly Malygin, Sergey Zlobin, Sultan Isali, Vasily Kalugin, Stanislav Ilyushin, Nuriza Aitassova, Yi Fei, Zeng Weidi
https://arxiv.org/abs/2508.09072
Efficient Autoregressive Inference for Transformer Probabilistic Models
Conor Hassan, Nasrulloh Loka, Cen-You Li, Daolang Huang, Paul E. Chang, Yang Yang, Francesco Silvestrin, Samuel Kaski, Luigi Acerbi
https://arxiv.org/abs/2510.09477
Robust and Efficient Semiparametric Inference for the Stepped Wedge Design
Fan Xia, K. C. Gary Chan, Emily Voldal, Avi Kenny, Patrick J. Heagerty, James P. Hughes
https://arxiv.org/abs/2510.08972
Combating the Memory Walls: Optimization Pathways for Long-Context Agentic LLM Inference
Haoran Wu, Can Xiao, Jiayi Nie, Xuan Guo, Binglei Lou, Jeffrey T. H. Wong, Zhiwen Mo, Cheng Zhang, Przemyslaw Forys, Wayne Luk, Hongxiang Fan, Jianyi Cheng, Timothy M. Jones, Rika Antonova, Robert Mullins, Aaron Zhao
https://arxiv.org/abs/2509.09505
Profiling Concurrent Vision Inference Workloads on NVIDIA Jetson -- Extended
Abhinaba Chakraborty, Wouter Tavernier, Akis Kourtis, Mario Pickavet, Andreas Oikonomakis, Didier Colle
https://arxiv.org/abs/2508.08430
FIER: Fine-Grained and Efficient KV Cache Retrieval for Long-context LLM Inference
Dongwei Wang, Zijie Liu, Song Wang, Yuxin Ren, Jianing Deng, Jingtong Hu, Tianlong Chen, Huanrui Yang
https://arxiv.org/abs/2508.08256
Cosmology inference with perturbative forward modeling at the field level: a comparison with joint power spectrum and bispectrum analyses
Kazuyuki Akitsu, Marko Simonovi\'c, Shi-Fan Chen, Giovanni Cabass, Matias Zaldarriaga
https://arxiv.org/abs/2509.09673
Toward Optimal Statistical Inference in Noisy Linear Quadratic Reinforcement Learning over a Finite Horizon
Bo Pan, Jianya Lu, Yafei Wang, Hao Li, Bei Jiang, Linglong Kong
https://arxiv.org/abs/2508.08436
In situ estimation of the acoustic surface impedance using simulation-based inference
Jonas M. Schmid, Johannes D. Schmid, Martin Eser, Steffen Marburg
https://arxiv.org/abs/2509.08873
Jupiter: Enhancing LLM Data Analysis Capabilities via Notebook and Inference-Time Value-Guided Search
Shuocheng Li, Yihao Liu, Silin Du, Wenxuan Zeng, Zhe Xu, Mengyu Zhou, Yeye He, Haoyu Dong, Shi Han, Dongmei Zhang
https://arxiv.org/abs/2509.09245
FLRC: Fine-grained Low-Rank Compressor for Efficient LLM Inference
Yu-Chen Lu, Chong-Yan Chen, Chi-Chih Chang, Yu-Fang Hu, Kai-Chiang Wu
https://arxiv.org/abs/2510.09332 https:/…
When to Reason: Semantic Router for vLLM
Chen Wang, Xunzhuo Liu, Yuhan Liu, Yue Zhu, Xiangxi Mo, Junchen Jiang, Huamin Chen
https://arxiv.org/abs/2510.08731 https://
ENSI: Efficient Non-Interactive Secure Inference for Large Language Models
Zhiyu He, Maojiang Wang, Xinwen Gao, Yuchuan Luo, Lin Liu, Shaojing Fu
https://arxiv.org/abs/2509.09424
The Impact of Device Type, Data Practices, and Use Case Scenarios on Privacy Concerns about Eye-tracked Augmented Reality in the United States and Germany
Efe Bozkir, Babette B\"uhler, Xiaoyuan Wu, Enkelejda Kasneci, Lujo Bauer, Lorrie Faith Cranor
https://arxiv.org/abs/2509.09285
A Design-based Solution for Causal Inference with Text: Can a Language Model Be Too Large?
Graham Tierney, Srikar Katta, Christopher Bail, Sunshine Hillygus, Alexander Volfovsky
https://arxiv.org/abs/2510.08758
LIMFAST. IV. Learning High-Redshift Galaxy Formation from Multiline Intensity Mapping with Implicit Likelihood Inference
Guochao Sun, Tri Nguyen, Claude-Andr\'e Faucher-Gigu\`ere, Adam Lidz, Tjitske Starkenburg, Bryan R. Scott, Tzu-Ching Chang, Steven R. Furlanetto
https://arxiv.org/abs/2509.07060
Unsupervised full-field Bayesian inference of orthotropic hyperelasticity from a single biaxial test: a myocardial case study
Rogier P. Krijnen, Akshay Joshi, Siddhant Kumar, Mathias Peirlinck
https://arxiv.org/abs/2510.09498
Mask Tokens as Prophet: Fine-Grained Cache Eviction for Efficient dLLM Inference
Jianuo Huang, Yaojie Zhang, Yicun Yang, Benhao Huang, Biqing Qi, Dongrui Liu, Linfeng Zhang
https://arxiv.org/abs/2510.09309
Selective KV-Cache Sharing to Mitigate Timing Side-Channels in LLM Inference
Kexin Chu, Zecheng Lin, Dawei Xiang, Zixu Shen, Jianchang Su, Cheng Chu, Yiwei Yang, Wenhui Zhang, Wenfei Wu, Wei Zhang
https://arxiv.org/abs/2508.08438
SPARC: Soft Probabilistic Adaptive multi-interest Retrieval Model via Codebooks for recommender system
Jialiang Shi, Yaguang Dou, Tian Qi
https://arxiv.org/abs/2508.09090 https:…
Adaptive Pareto-Optimal Token Merging for Edge Transformer Models in Semantic Communication
Omar Erak, Omar Alhussein, Hatem Abou-Zeid, Mehdi Bennis
https://arxiv.org/abs/2509.09168
Restoring detailed balance in non-Hermitian Markov processes
Tim Van Wesemael, Gilberto Nakamura, Jan Baetens, Odemir M. Bruno, Alexandre S. Martinez, Christophe Deroulers
https://arxiv.org/abs/2510.09467
Boosting Embodied AI Agents through Perception-Generation Disaggregation and Asynchronous Pipeline Execution
Shulai Zhang, Ao Xu, Quan Chen, Han Zhao, Weihao Cui, Ningxin Zheng, Haibin Lin, Xin Liu, Minyi Guo
https://arxiv.org/abs/2509.09560
Active Membership Inference Test (aMINT): Enhancing Model Auditability with Multi-Task Learning
Daniel DeAlcala, Aythami Morales, Julian Fierrez, Gonzalo Mancera, Ruben Tolosana, Javier Ortega-Garcia
https://arxiv.org/abs/2509.07879
DuoServe-MoE: Dual-Phase Expert Prefetch and Cache Scheduling for Efficient MoE LLM Inference
Yuning Zhang, Grant Pinkert, Nan Yang, Yanli Li, Dong Yuan
https://arxiv.org/abs/2509.07379
SPAD: Specialized Prefill and Decode Hardware for Disaggregated LLM Inference
Hengrui Zhang, Pratyush Patel, August Ning, David Wentzlaff
https://arxiv.org/abs/2510.08544 https:…
Baseten, which helps companies launch open-source or custom AI models, raised a $150M Series D led by Bond at a $2.15B valuation, up from $825M in February (Allie Garfinkle/Fortune)
https://fortune.com/2025/09/05/exclusive-b…
Handling Open-Vocabulary Constructs in Formalizing Specifications: Retrieval-Augmented Parsing with Expert Knowledge
Mohammad Saqib Hasan, Sayontan Ghosh, Dhruv Verma, Geoff Kuenning, Erez Zadok, Scott A. Smolka, Niranjan Balasubramanian
https://arxiv.org/abs/2509.08808
Uncertainty Quantification for Multi-level Models Using the Survey-Weighted Pseudo-Posterior
Matthew R. Williams, F. Hunter McGuire, Terrance D. Savitsky
https://arxiv.org/abs/2510.09401
DiFlow-TTS: Discrete Flow Matching with Factorized Speech Tokens for Low-Latency Zero-Shot Text-To-Speech
Ngoc-Son Nguyen, Hieu-Nghia Huynh-Nguyen, Thanh V. T. Tran, Truong-Son Hy, Van Nguyen
https://arxiv.org/abs/2509.09631
Dynamic Automated Deduction by Contradiction Separation: The Standard Extension Algorithm
Yang Xu, Xingxing He, Shuwei Chen, Jun Liu, Xiaomei Zhong
https://arxiv.org/abs/2510.08468
Stick-Breaking Mixture Normalizing Flows with Component-Wise Tail Adaptation for Variational Inference
Seungsu Han, Juyoung Hwang, Won Chang
https://arxiv.org/abs/2510.07965 htt…
Taking the Weight Off: Mitigating Parameter Bias from Catastrophic Outliers in 3$\times$2pt Analysis
Carolyn McDonald Mill, C. Danielle Leonard, Markus Michael Rau, Cora Uhlemann, Shahab Joudaki
https://arxiv.org/abs/2509.08052
High Utilization Energy-Aware Real-Time Inference Deep Convolutional Neural Network Accelerator
Kuan-Ting Lin, Ching-Te Chiu, Jheng-Yi Chang, Shi-Zong Huang, Yu-Ting Li
https://arxiv.org/abs/2509.05688
SVN-ICP: Uncertainty Estimation of ICP-based LiDAR Odometry using Stein Variational Newton
Shiping Ma, Haoming Zhang, Marc Toussaint
https://arxiv.org/abs/2509.08069 https://
Automatic Failure Attribution and Critical Step Prediction Method for Multi-Agent Systems Based on Causal Inference
Guoqing Ma, Jia Zhu, Hanghui Guo, Weijie Shi, Jiawei Shen, Jingjiang Liu, Yidan Liang
https://arxiv.org/abs/2509.08682
Replaced article(s) found for cs.DC. https://arxiv.org/list/cs.DC/new
[1/1]:
- Keep Your Friends Close: Leveraging Affinity Groups to Accelerate AI Inference Workflows
Thiago Garrett, Weijia Song, Roman Vitenberg, Ken Birman
FriendliAI, which aims to help companies run AI model inference faster and cheaper, raised a $20M extension to its $6M seed fund from late 2021 (Mary Ann Azevedo/Crunchbase News)
https://news.crunchbase.com/ai/inference-platform-friendliai-raises-seed…
An Interval Type-2 Version of Bayes Theorem Derived from Interval Probability Range Estimates Provided by Subject Matter Experts
John T. Rickard, William A. Dembski, James Rickards
https://arxiv.org/abs/2509.08834
Cosmology Likelihood for Observables in \Euclid (CLOE). 1. Theoretical recipe
Collaboration, Cardone, Joudaki, Blot, Bonici, Camera, Ca\~nas-Herrera, Carrilho, Casas, Davini, Di Domizio, Farrens, Goh, Beauchamps, Ili\'c, Keil, Le Brun, Martinelli, Moretti, Pettorino, Pezzotta, S\'anchez, Sakr, Sciotti, Tanidis, Tutusaus, Ajani, Crocce, Giocoli, Legrand, Lembo, Lesci, Girones, Nouri-Zonoz, Pamuk, Tsedrik, Bel, Carbone, Duncan, Kilbinger, Lacasa, Lattanzi, Sapone, Sellentin, Tayl…
CCF: A Context Compression Framework for Efficient Long-Sequence Language Modeling
Wenhao Li, Bangcheng Sun, Weihao Ye, Tianyi Zhang, Daohai Yu, Fei Chao, Rongrong Ji
https://arxiv.org/abs/2509.09199
BitROM: Weight Reload-Free CiROM Architecture Towards Billion-Parameter 1.58-bit LLM Inference
Wenlun Zhang, Xinyu Li, Shimpei Ando, Kentaro Yoshioka
https://arxiv.org/abs/2509.08542
MOSAIC: Minimax-Optimal Sparsity-Adaptive Inference for Change Points in Dynamic Networks
Yingying Fan, Jingyuan Liu, Jinchi Lv, Ao Sun
https://arxiv.org/abs/2509.06303 https://…
RadioFlow: Efficient Radio Map Construction Framework with Flow Matching
Haozhe Jia, Wenshuo Chen, Xiucheng Wang, Nan Cheng, Hongbo Zhang, Kuimou Yu, Songning Lai, Nanjian Jia, Bowen Tian, Hongru Xiao, Yutao Yue
https://arxiv.org/abs/2510.09314
OSCAR: Orthogonal Stochastic Control for Alignment-Respecting Diversity in Flow Matching
Jingxuan Wu, Zhenglin Wan, Xingrui Yu, Yuzhe Yang, Bo An, Ivor Tsang
https://arxiv.org/abs/2510.09060
Comparison of Fully Homomorphic Encryption and Garbled Circuit Techniques in Privacy-Preserving Machine Learning Inference
Kalyan Cheerla (University of North Texas), Lotfi Ben Othmane (University of North Texas), Kirill Morozov (University of North Texas)
https://arxiv.org/abs/2510.07457
HyperVLA: Efficient Inference in Vision-Language-Action Models via Hypernetworks
Zheng Xiong, Kang Li, Zilin Wang, Matthew Jackson, Jakob Foerster, Shimon Whiteson
https://arxiv.org/abs/2510.04898
PAC Reasoning: Controlling the Performance Loss for Efficient Reasoning
Hao Zeng, Jianguo Huang, Bingyi Jing, Hongxin Wei, Bo An
https://arxiv.org/abs/2510.09133 https://…
Turbo-VAED: Fast and Stable Transfer of Video-VAEs to Mobile Devices
Ya Zou, Jingfeng Yao, Siyuan Yu, Shuai Zhang, Wenyu Liu, Xinggang Wang
https://arxiv.org/abs/2508.09136 http…
Fisher Random Walk: Automatic Debiasing Contextual Preference Inference for Large Language Model Evaluation
Yichi Zhang, Alexander Belloni, Ethan X. Fang, Junwei Lu, Xiaoan Xu
https://arxiv.org/abs/2509.05852
Mix- and MoE-DPO: A Variational Inference Approach to Direct Preference Optimization
Jason Bohne, Pawel Polak, David Rosenberg, Brian Bloniarz, Gary Kazantsev
https://arxiv.org/abs/2510.08256
Hybrid Models for Natural Language Reasoning: The Case of Syllogistic Logic
Manuel Vargas Guzm\'an, Jakub Szymanik, Maciej Malicki
https://arxiv.org/abs/2510.09472 https://
Video Parallel Scaling: Aggregating Diverse Frame Subsets for VideoLLMs
Hyungjin Chung, Hyelin Nam, Jiyeon Kim, Hyojun Go, Byeongjun Park, Junho Kim, Joonseok Lee, Seongsu Ha, Byung-Hoon Kim
https://arxiv.org/abs/2509.08016
Sensitivity Analysis to Unobserved Confounding with Copula-based Normalizing Flows
Sourabh Balgi, Marc Braun, Jose M. Pe\~na, Adel Daoud
https://arxiv.org/abs/2508.08752 https:/…
Steering MoE LLMs via Expert (De)Activation
Mohsen Fayyaz, Ali Modarressi, Hanieh Deilamsalehy, Franck Dernoncourt, Ryan Rossi, Trung Bui, Hinrich Sch\"utze, Nanyun Peng
https://arxiv.org/abs/2509.09660
Membership Inference Attacks on Tokenizers of Large Language Models
Meng Tong, Yuntao Du, Kejiang Chen, Weiming Zhang, Ninghui Li
https://arxiv.org/abs/2510.05699 https://
Best-of-Majority: Minimax-Optimal Strategy for Pass@$k$ Inference Scaling
Qiwei Di, Kaixuan Ji, Xuheng Li, Heyang Zhao, Quanquan Gu
https://arxiv.org/abs/2510.03199 https://
Modality-Agnostic Input Channels Enable Segmentation of Brain lesions in Multimodal MRI with Sequences Unavailable During Training
Anthony P. Addison, Felix Wagner, Wentian Xu, Natalie Voets, Konstantinos Kamnitsas
https://arxiv.org/abs/2509.09290
Tiny-R1V: Lightweight Multimodal Unified Reasoning Model via Model Merging
Qixiang Yin, Huanjin Yao, Jianghao Chen, Jiaxing Huang, Zhicheng Zhao, Fei Su
https://arxiv.org/abs/2510.08987
MoE-Compression: How the Compression Error of Experts Affects the Inference Accuracy of MoE Model?
Songkai Ma, Zhaorui Zhang, Sheng Di, Benben Liu, Xiaodong Yu, Xiaoyi Lu, Dan Wang
https://arxiv.org/abs/2509.07727
DCMI: A Differential Calibration Membership Inference Attack Against Retrieval-Augmented Generation
Xinyu Gao, Xiangtao Meng, Yingkai Dong, Zheng Li, Shanqing Guo
https://arxiv.org/abs/2509.06026
Dynamic Features Adaptation in Networking: Toward Flexible training and Explainable inference
Yannis Belkhiter, Seshu Tirupathi, Giulio Zizzo, Merim Dzaferagic, John D. Kelleher
https://arxiv.org/abs/2510.08303
Staircase Streaming for Low-Latency Multi-Agent Inference
Junlin Wang (Zach), Jue Wang (Zach), Zhen (Zach), Xu, Ben Athiwaratkun, Bhuwan Dhingra, Ce Zhang, James Zou
https://arxiv.org/abs/2510.05059 …
Empirical Comparison of Membership Inference Attacks in Deep Transfer Learning
Yuxuan Bai, Gauri Pradhan, Marlon Tobaben, Antti Honkela
https://arxiv.org/abs/2510.05753 https://…