ScaleCap: Inference-Time Scalable Image Captioning via Dual-Modality Debiasing
Long Xing, Qidong Huang, Xiaoyi Dong, Pan Zhang, Yuhang Zang, Yuhang Cao, Jinsong Li, Shuangrui Ding, Weiming Zhang, Nenghai Yu, Jiaqi Wang, Feng Wu, Dahua Lin
https://arxiv.org/abs/2506.19848
Agentic AI framework for End-to-End Medical Data Inference
Soorya Ram Shimgekar, Shayan Vassef, Abhay Goyal, Navin Kumar, Koustuv Saha
https://arxiv.org/abs/2507.18115 https://
When Life Gives You Samples: The Benefits of Scaling up Inference Compute for Multilingual LLMs
Ammar Khairi, Daniel D'souza, Ye Shen, Julia Kreutzer, Sara Hooker
https://arxiv.org/abs/2506.20544
Evaluating the Defense Potential of Machine Unlearning against Membership Inference Attacks
Aristeidis Sidiropoulos, Christos Chrysanthos Nikolaidis, Theodoros Tsiolakis, Nikolaos Pavlidis, Vasilis Perifanis, Pavlos S. Efraimidis
https://arxiv.org/abs/2508.16150
Simulation-Based Inference for Direction Reconstruction of Ultra-High-Energy Cosmic Rays with Radio Arrays
Oscar Macias, Zachary Mason, Matthew Ho, Ars\`ene Ferri\`ere, Aur\'elien Benoit-L\'evy, Mat\'ias Tueros
https://arxiv.org/abs/2508.15991
Validating Sequential Monte Carlo for Gravitational-Wave Inference
Michael J. Williams, Minas Karamanis, Yilin Luo, Uro\v{s} Seljak
https://arxiv.org/abs/2506.18977
ViRN: Variational Inference and Distribution Trilateration for Long-Tailed Continual Representation Learning
Hao Dai, Chong Tang, Jagmohan Chauhan
https://arxiv.org/abs/2507.17368
Minimax Data Sanitization with Distortion Constraint and Adversarial Inference
Amirarsalan Moatazedian, Yauhen Yakimenka, R\'emi A. Chou, J\"org Kliewer
https://arxiv.org/abs/2507.17942
Bare-Metal RISC-V NVDLA SoC for Efficient Deep Learning Inference
Vineet Kumar (School of Electrical,Electronic Engineering, University College Dublin, Dublin, Ireland, Department of Electronic,Electrical Engineering, Trinity College Dublin, Dublin, Ireland), Ajay Kumar M (School of Electrical,Electronic Engineering, University College Dublin, Dublin, Ireland, Department of Electronic,Electrical Engineering, Trinity College Dublin, Dublin, Ireland), Yike Li (School of Electrical,Elec…
from my link log —
Damas-Hindley-Milner inference two ways.
https://bernsteinbear.com/blog/type-inference/
saved 2024-10-17 https…
Simulation based inference of the ionization history from the 2D 21 cm power spectrum
Nadia Cooper, Carina Norregaard, Romain Meriot, Jonathan R. Pritchard
https://arxiv.org/abs/2508.16329
Exact Conditional Score-Guided Generative Modeling for Amortized Inference in Uncertainty Quantification
Zezhong Zhang, Caroline Tatsuoka, Dongbin Xiu, Guannan Zhang
https://arxiv.org/abs/2506.18227
Probabilistic Inference for Datalog with Correlated Inputs
Jingbo Wang, Shashin Halalingaiah, Weiyi Chen, Chao Wang, Isil Dillig
https://arxiv.org/abs/2508.15166 https://…
Temporal Broadening of Attosecond Pulse Trains Induced by Multi-Band inference in Solid-State High-Order Harmonic Generation
Qing-Guo Fan, Kang Lai, Wen-hao Liu, Zhi Wang, Lin-Wang Wang, Jun-Wei Luo
https://arxiv.org/abs/2507.18019
WiLLM: An Open Wireless LLM Communication System
Boyi Liu, Yongguang Lu, Jianguo Zhao, Qiang Yang, Wen Wu, Lin Chen, Jagmohan Chauhan, Jun Zhang
https://arxiv.org/abs/2506.19030
Inference Scaled GraphRAG: Improving Multi Hop Question Answering on Knowledge Graphs
Travis Thompson, Seung-Hwan Lim, Paul Liu, Ruoying He, Dongkuan Xu
https://arxiv.org/abs/2506.19967
Tomography for Plasma Imaging: a Unifying Framework for Bayesian Inference
D. Hamm, C. Theiler, M. Simeoni, B. P. Duval, T. Debarre, L. Simons, J. R. Queralt
https://arxiv.org/abs/2506.20232
LoRA-Leak: Membership Inference Attacks Against LoRA Fine-tuned Language Models
Delong Ran, Xinlei He, Tianshuo Cong, Anyu Wang, Qi Li, Xiaoyun Wang
https://arxiv.org/abs/2507.18302
GreenLLM: SLO-Aware Dynamic Frequency Scaling for Energy-Efficient LLM Serving
Qunyou Liu, Darong Huang, Marina Zapater, David Atienza
https://arxiv.org/abs/2508.16449 https://
Hybrid quantum-classical algorithm for near-optimal planning in POMDPs
Gilberto Cunha, Alexandra Ram\^oa, Andr\'e Sequeira, Michael de Oliveira, Lu\'is Barbosa
https://arxiv.org/abs/2507.18606 …
Replaced article(s) found for cs.RO. https://arxiv.org/list/cs.RO/new
[1/2]:
- Stochastic Motion Planning as Gaussian Variational Inference: Theory and Algorithms
Hongzhe Yu, Yongxin Chen
SATORI: Static Test Oracle Generation for REST APIs
Juan C. Alonso, Alberto Martin-Lopez, Sergio Segura, Gabriele Bavota, Antonio Ruiz-Cort\'es
https://arxiv.org/abs/2508.16318
MEDEA: A Design-Time Multi-Objective Manager for Energy-Efficient DNN Inference on Heterogeneous Ultra-Low Power Platforms
Hossein Taji, Jos\'e Miranda, Miguel Pe\'on-Quir\'os, David Atienza
https://arxiv.org/abs/2506.19067
A Probabilistic Inference Scaling Theory for LLM Self-Correction
Zhe Yang, Yichang Zhang, Yudong Wang, Ziyao Xu, Junyang Lin, Zhifang Sui
https://arxiv.org/abs/2508.16456 https:…
Right a few days before I'll be talking about patterns in DRAM init at #GPN23, #Binarly are posting on their type inference tooling:
ITO-Master: Inference-Time Optimization for Audio Effects Modeling of Music Mastering Processors
Junghyun Koo, Marco A. Martinez-Ramirez, Wei-Hsiang Liao, Giorgio Fabbro, Michele Mancusi, Yuki Mitsufuji
https://arxiv.org/abs/2506.16889
UniSegDiff: Boosting Unified Lesion Segmentation via a Staged Diffusion Model
Yilong Hu, Shijie Chang, Lihe Zhang, Feng Tian, Weibing Sun, Huchuan Lu
https://arxiv.org/abs/2507.18362
Waging a Campaign: Results from an Injection-Recovery Study involving 35 numerical Relativity Simulations and three Waveform Models
Sarp Ak\c{c}ay, Charlie Hoy, Jake Mac Uilliam
https://arxiv.org/abs/2506.19990
MNN-AECS: Energy Optimization for LLM Decoding on Mobile Devices via Adaptive Core Selection
Zhengxiang Huang, Chaoyue Niu, Zhaode Wang, Jiarui Xue, Hanming Zhang, Yugang Wang, Zewei Xin, Xiaotang Jiang, Chengfei Lv, Fan Wu, Guihai Chen
https://arxiv.org/abs/2506.19884
Statistical Inference for Optimal Transport Maps: Recent Advances and Perspectives
Sivaraman Balakrishnan, Tudor Manole, Larry Wasserman
https://arxiv.org/abs/2506.19025
Decentralized Consensus Inference-based Hierarchical Reinforcement Learning for Multi-Constrained UAV Pursuit-Evasion Game
Xiang Yuming, Li Sizhao, Li Rongpeng, Zhao Zhifeng, Zhang Honggang
https://arxiv.org/abs/2506.18126
Inductive Domain Transfer In Misspecified Simulation-Based Inference
Ortal Senouf, Antoine Wehenkel, C\'edric Vincent-Cuaz, Emmanuel Abb\'e, Pascal Frossard
https://arxiv.org/abs/2508.15593
Retrieval-Confused Generation is a Good Defender for Privacy Violation Attack of Large Language Models
Wanli Peng, Xin Chen, Hang Fu, XinYu He, Xue Yiming, Juan Wen
https://arxiv.org/abs/2506.19889
Scaling Group Inference for Diverse and High-Quality Generation
Gaurav Parmar, Or Patashnik, Daniil Ostashev, Kuan-Chieh Wang, Kfir Aberman, Srinivasa Narasimhan, Jun-Yan Zhu
https://arxiv.org/abs/2508.15773
SuperSONIC: Cloud-Native Infrastructure for ML Inferencing
Dmitry Kondratyev, Benedikt Riedel, Yuan-Tang Chou, Miles Cochran-Branson, Noah Paladino, David Schultz, Mia Liu, Javier Duarte, Philip Harris, Shih-Chieh Hsu
https://arxiv.org/abs/2506.20657
TMA-Adaptive FP8 Grouped GEMM: Eliminating Padding Requirements in Low-Precision Training and Inference on Hopper
Zhongling Su, Rong Fu, Weihan Cao, Jianfei Gao, Minxi Jin, Zhilin Pei, Hui Wang
https://arxiv.org/abs/2508.16584
GPL-SLAM: A Laser SLAM Framework with Gaussian Process Based Extended Landmarks
Ali Emre Balc{\i} (TU Delft), Erhan Ege Keyvan (Middle East Technical University), Emre \"Ozkan (Middle East Technical University)
https://arxiv.org/abs/2508.16459
Replaced article(s) found for cs.IT. https://arxiv.org/list/cs.IT/new
[1/1]:
- Computation and Communication Co-scheduling for Multi-Task Remote Inference
Md Kamran Chowdhury Shisher, Adam Piaseczny, Yin Sun, Christopher G. Brinton
Tab-MIA: A Benchmark Dataset for Membership Inference Attacks on Tabular Data in LLMs
Eyal German, Sagiv Antebi, Daniel Samira, Asaf Shabtai, Yuval Elovici
https://arxiv.org/abs/2507.17259
Hardwired-Neurons Language Processing Units as General-Purpose Cognitive Substrates
Yang Liu, Yi Chen, Yongwei Zhao, Yifan Hao, Zifu Zheng, Weihao Kong, Zhangmai Li, Dongchen Jiang, Ruiyang Xia, Zhihong Ma, Zisheng Liu, Zhaoyong Wan, Yunqi Lu, Ximing Liu, Hongrui Guo, Zhihao Yang, Zhe Wang, Tianrui Ma, Mo Zou, Rui Zhang, Ling Li, Xing Hu, Zidong Du, Zhiwei Xu, Qi Guo, Tianshi Chen, Yunji Chen
Forecast then Calibrate: Feature Caching as ODE for Efficient Diffusion Transformers
Shikang Zheng, Liang Feng, Xinyu Wang, Qinming Zhou, Peiliang Cai, Chang Zou, Jiacheng Liu, Yuqi Lin, Junjie Chen, Yue Ma, Linfeng Zhang
https://arxiv.org/abs/2508.16211
Bridging the Gap in Ophthalmic AI: MM-Retinal-Reason Dataset and OphthaReason Model toward Dynamic Multimodal Reasoning
Ruiqi Wu, Yuang Yao, Tengfei Ma, Chenran Zhang, Na Su, Tao Zhou, Geng Chen, Wen Fan, Yi Zhou
https://arxiv.org/abs/2508.16129
BrownoutServe: SLO-Aware Inference Serving under Bursty Workloads for MoE-based LLMs
Jianmin Hu, Minxian Xu, Kejiang Ye, Chengzhong Xu
https://arxiv.org/abs/2507.17133 https://
SiLQ: Simple Large Language Model Quantization-Aware Training
Steven K. Esser, Jeffrey L. McKinstry, Deepika Bablani, Rathinakumar Appuswamy, Dharmendra S. Modha
https://arxiv.org/abs/2507.16933
Radial Attention: $O(n\log n)$ Sparse Attention with Energy Decay for Long Video Generation
Xingyang Li, Muyang Li, Tianle Cai, Haocheng Xi, Shuo Yang, Yujun Lin, Lvmin Zhang, Songlin Yang, Jinbo Hu, Kelly Peng, Maneesh Agrawala, Ion Stoica, Kurt Keutzer, Song Han
https://arxiv.org/abs/2506.19852…
BucketServe: Bucket-Based Dynamic Batching for Smart and Efficient LLM Inference Serving
Wanyi Zheng, Minxian Xu, Shengye Song, Kejiang Ye
https://arxiv.org/abs/2507.17120 https…
Bridging the Gap in Ophthalmic AI: MM-Retinal-Reason Dataset and OphthaReason Model toward Dynamic Multimodal Reasoning
Ruiqi Wu, Yuang Yao, Tengfei Ma, Chenran Zhang, Na Su, Tao Zhou, Geng Chen, Wen Fan, Yi Zhou
https://arxiv.org/abs/2508.16129
A Modular Multitask Reasoning Framework Integrating Spatio-temporal Models and LLMs
Kethmi Hirushini Hettige, Jiahao Ji, Cheng Long, Shili Xiang, Gao Cong, Jingyuan Wang
https://arxiv.org/abs/2506.20073
From Tiny Machine Learning to Tiny Deep Learning: A Survey
Shriyank Somvanshi, Md Monzurul Islam, Gaurab Chhetri, Rohit Chakraborty, Mahmuda Sultana Mimi, Swagat Ahmed Shuvo, Kazi Sifatul Islam, Syed Aaqib Javed, Sharif Ahmed Rafat, Anandi Dutta, Subasish Das
https://arxiv.org/abs/2506.18927…
A Lightweight Group Multiscale Bidirectional Interactive Network for Real-Time Steel Surface Defect Detection
Yong Zhang, Cunjian Chen, Qiang Gao, Yi Wang, Bin Fang
https://arxiv.org/abs/2508.16397
GPTailor: Large Language Model Pruning Through Layer Cutting and Stitching
Guinan Su, Li Shen, Lu Yin, Shiwei Liu, Yanwu Yang, Jonas Geiping
https://arxiv.org/abs/2506.20480
Collaborative Inference and Learning between Edge SLMs and Cloud LLMs: A Survey of Algorithms, Execution, and Open Challenges
Senyao Li, Haozhao Wang, Wenchao Xu, Rui Zhang, Song Guo, Jingling Yuan, Xian Zhong, Tianwei Zhang, Ruixuan Li
https://arxiv.org/abs/2507.16731
PrivacyXray: Detecting Privacy Breaches in LLMs through Semantic Consistency and Probability Certainty
Jinwen He, Yiyang Lu, Zijin Lin, Kai Chen, Yue Zhao
https://arxiv.org/abs/2506.19563
Exploiting Information Redundancy in Attention Maps for Extreme Quantization of Vision Transformers
Lucas Maisonnave, Karim Haroun, Tom Pegeot
https://arxiv.org/abs/2508.16311 h…
On the Evolution of Federated Post-Training Large Language Models: A Model Accessibility View
Tao Guo, Junxiao Wang, Fushuo Huo, Laizhong Cui, Song Guo, Jie Gui, Dacheng Tao
https://arxiv.org/abs/2508.16261
Embedded FPGA Acceleration of Brain-Like Neural Networks: Online Learning to Scalable Inference
Muhammad Ihsan Al Hafiz, Naresh Ravichandran, Anders Lansner, Pawel Herman, Artur Podobas
https://arxiv.org/abs/2506.18530
HE-LRM: Encrypted Deep Learning Recommendation Models using Fully Homomorphic Encryption
Karthik Garimella, Austin Ebel, Gabrielle De Micheli, Brandon Reagen
https://arxiv.org/abs/2506.18150
DiffuCoder: Understanding and Improving Masked Diffusion Models for Code Generation
Shansan Gong, Ruixiang Zhang, Huangjie Zheng, Jiatao Gu, Navdeep Jaitly, Lingpeng Kong, Yizhe Zhang
https://arxiv.org/abs/2506.20639
Cloud Native System for LLM Inference Serving
Minxian Xu, Junhan Liao, Jingfeng Wu, Yiyuan He, Kejiang Ye, Chengzhong Xu
https://arxiv.org/abs/2507.18007 https://
Efficient LLM Inference: Bandwidth, Compute, Synchronization, and Capacity are all you need
Michael Davies, Neal Crago, Karthikeyan Sankaralingam, Christos Kozyrakis
https://arxiv.org/abs/2507.14397
Bayesian Inference for Left-Truncated Log-Logistic Distributions for Time-to-event Data Analysis
Fahad Mostafa, Md Rejuan Haque, Md Mostafijur Rahman, Farzana Nasrin
https://arxiv.org/abs/2506.17852
System Report for CCL25-Eval Task 10: SRAG-MAV for Fine-Grained Chinese Hate Speech Recognition
Jiahao Wang, Ramen Liu, Longhui Zhang, Jing Li
https://arxiv.org/abs/2507.18580 h…
Dual-Stage Value-Guided Inference with Margin-Based Reward Adjustment for Fast and Faithful VLM Captioning
Ankan Deria, Adinath Madhavrao Dukre, Feilong Tang, Sara Atito, Sudipta Roy, Muhammad Awais, Muhammad Haris Khan, Imran Razzak
https://arxiv.org/abs/2506.15649
Flexible Vector Integration in Embedded RISC-V SoCs for End to End CNN Inference Acceleration
Dmitri Lyalikov
https://arxiv.org/abs/2507.17771 https://arxi…
Efficient Mixed-Precision Large Language Model Inference with TurboMind
Li Zhang, Youhe Jiang, Guoliang He, Xin Chen, Han Lv, Qian Yao, Fangcheng Fu, Kai Chen
https://arxiv.org/abs/2508.15601
Inference on Nonlinear Counterfactual Functionals under a Multiplicative IV Model
Yonghoon Lee, Mengxin Yu, Jiewen Liu, Chan Park, Yunshu Zhang, James M. Robins, Eric J. Tchetgen Tchetgen
https://arxiv.org/abs/2507.15612
FCPO: Federated Continual Policy Optimization for Real-Time High-Throughput Edge Video Analytics
Lucas Liebe, Thanh-Tung Nguyen, Dongman Lee
https://arxiv.org/abs/2507.18047 htt…
Heterogeneous Quantile Treatment Effect Estimation for Longitudinal Data with High-Dimensional Confounding
Zhixin Qiu, Huichen Zhu, Wenjie Wang, Yanlin Tang
https://arxiv.org/abs/2508.16326
WattsOnAI: Measuring, Analyzing, and Visualizing Energy and Carbon Footprint of AI Workloads
Hongzhen Huang, Kunming Zhang, Hanlong Liao, Kui Wu, Guoming Tang
https://arxiv.org/abs/2506.20535
Quasi Instrumental Variable Methods for Stable Hidden Confounding and Binary Outcome
Zhonghua Liu, Baoluo Sun, Ting Ye, David Richardson, Eric Tchetgen Tchetgen
https://arxiv.org/abs/2508.16096
Replaced article(s) found for cs.DC. https://arxiv.org/list/cs.DC/new
[1/1]:
- Staleness-Centric Optimizations for Parallel Diffusion MoE Inference
Jiajun Luo, Lizhuo Luo, Jianru Xu, Jiajun Song, Rongwei Lu, Chen Tang, Zhi Wang
Replaced article(s) found for cs.DC. https://arxiv.org/list/cs.DC/new
[1/1]:
- Staleness-Centric Optimizations for Parallel Diffusion MoE Inference
Jiajun Luo, Lizhuo Luo, Jianru Xu, Jiajun Song, Rongwei Lu, Chen Tang, Zhi Wang
Principal stratification with recurrent events truncated by a terminal event: A nested Bayesian nonparametric approach
Yuki Ohnishi, Michael O. Harhay, Fan Li
https://arxiv.org/abs/2506.19015
Arctic Inference with Shift Parallelism: Fast and Efficient Open Source Inference System for Enterprise AI
Samyam Rajbhandari, Mert Hidayetoglu, Aurick Qiao, Ye Wang, Juncheng Yang, Jeff Rasley, Michael Wyatt, Yuxiong He
https://arxiv.org/abs/2507.11830

Arctic Inference with Shift Parallelism: Fast and Efficient Open Source Inference System for Enterprise AI
Inference is now the dominant AI workload, yet existing systems force trade-offs between latency, throughput, and cost. Arctic Inference, an open-source vLLM plugin from Snowflake AI Research, introduces Shift Parallelism, a dynamic parallelism strategy that adapts to real-world traffic while integrating speculative decoding, SwiftKV compute reduction, and optimized embedding inference. It achieves up to 3.4 times faster request completion, 1.75 times faster generation, and 1.6M tokens/sec per …