2025-09-03 08:24:53
DSDE: Dynamic Speculative Decoding with KLD Stability for Real-World Serving
Mingyu Yang, Jae-Young Choi, Kihyo Moon, Minsung Jang, Eunjoo Joen
https://arxiv.org/abs/2509.01083 …
DSDE: Dynamic Speculative Decoding with KLD Stability for Real-World Serving
Mingyu Yang, Jae-Young Choi, Kihyo Moon, Minsung Jang, Eunjoo Joen
https://arxiv.org/abs/2509.01083 …
AdaptCache: KV Cache Native Storage Hierarchy for Low-Delay and High-Quality Language Model Serving
Shaoting Feng, Hanchen Li, Kuntai Du, Zhuohan Gu, Yuhan Liu, Jiayi Yao, Siddhant Ray, Samuel Shen, Yihua Cheng, Ganesh Ananthanarayanan, Junchen Jiang
https://arxiv.org/abs/2509.00105
The Senate confirms former Fox News host and prosecutor Jeanine Pirro as US attorney for DC; she had been serving in the role on an interim basis since May (Nnamdi Egwuonwu/NBC News)
https://www.nbcnews.com/politics/congress/
35-year-old former U.S. Army sergeant, Bajun “Baji” Mavalwalla II,
faces up to six years in prison
for protesting against ICE deportations
in what legal experts are calling a test case for the
Trump administration’s attempts to criminalize and punish dissent.
Mavalwalla was arrested and charged with “conspiracy to impede or injure officers”
after he was identified in a video taken at the protest and shared on Instagram.
Mavalwalla, who survived a ro…
Two things are worth noticing right now:
1. The military brass *did not* respond well to Trump and Hegseth.
2. The deployment to #Portland keeps getting delayed.
The military will never say "no" to the president (unless he's literally ordering them to open fire on unarmed civilians or something equally obviously illegal). But there are ways to not comply that don't necessarily involve refusal. Brass showing that they aren't aligned with Trump may weaken his billionaire backers, who might be realizing now that weak dictators who can't lead their militaries tend to get toppled... and their oligarch-backers tend to end up against walls.
If folks being ordered to send troops to #PDX don't want to comply, delaying until the there's an initial response from the lawsuit would be basically impossible to detect. The deployment to LA went far too fast, running into logistical challenges like troops sleeping on the floor. The delays we've already seen could indicate either a more careful approach or quiet resistance.
Trump will continue to escalate at every chance he gets. I would be surprised if PDX didn't give him a fight. I doubt the troops will become more interested in serving a guy who's stabbed them in the back and wasted their time at every opportunity.
It is still possible troops just won't deploy. Trump will make something up about how just the threat of an intervention was enough to make things safe or something like that. If we see that, it's 100% the military telling him to kick rocks because he's not competent enough to know when to back down.
Honestly, I think Trump wants revenge for the resistance PDX put up at the end of his last term. Any backing down from that is absolutely a big loss for him.
GenCompositor: Generative Video Compositing with Diffusion Transformer
Shuzhou Yang, Xiaoyu Li, Xiaodong Cun, Guangzhi Wang, Lingen Li, Ying Shan, Jian Zhang
https://arxiv.org/abs/2509.02460
These Republicans oppose DEI, but also cuts hitting Hispanic-serving colleges (Rachel Hatzipanagos/Washington Post)
https://www.washingtonpost.com/nation/2025/10/30/hispanic-serving-institutions-republican-support/
http://www.memeorandum.com/251030/p70#a251030p70
Due to the devastating harm being done to the USA by the current administration, and the administration of Texas I have donated again to the Texas Civil Rights Project https://www.txcivilrights.org (doubling my giving for 2025).
I will be doing the same for several other
LiquidGEMM: Hardware-Efficient W4A8 GEMM Kernel for High-Performance LLM Serving
Huanqi Hu, Bowen Xiao, Shixuan Sun, Jianian Yin, Zhexi Zhang, Xiang Luo, Chengquan Jiang, Weiqi Xu, Xiaoying Jia, Xin Liu, Minyi Guo
https://arxiv.org/abs/2509.01229
Batch Query Processing and Optimization for Agentic Workflows
Junyi Shen, Noppanat Wadlom, Yao Lu
https://arxiv.org/abs/2509.02121 https://arxiv.org/pdf/25…
Loop Quantum Vector-Tensor Gravity and Its Spherically Symmetric Model
Shengzhi Li, Yongge Ma
https://arxiv.org/abs/2509.02056 https://arxiv.org/pdf/2509.0…
The headline here is that
(1) the increasingly authoritarian US government demanded that Google suppress videos serving the public interest because those videos are embarrassing to government officials, and
(2) Google readily over-complied with that request, deleting not just the videos but the entire channel.
https://www.theguardian.com/us-news/2025/aug/31/fda-official-youtube-videos
exaPD: A highly parallelizable workflow for multi-element phase diagram (PD) construction
Feng Zhang, Zhuo Ye, Maxim Moraru, Ying Wai Li, Weiyi Xia, Yongxin Yao, Ryan Richard, Cai-Zhuang Wang
https://arxiv.org/abs/2510.01400
Signal Drop in Magnification Profiles: Combining Lensing Simulations and Observations
David Crespo, Joaqu\'in Gonz\'alez-Nuevo, Laura Bonavera, Marcos M. Cueli, Hu Zou, Rebeca Fern\'andez-Fern\'andez, Jose M. Casas
https://arxiv.org/abs/2509.02213
VarCoNet: A variability-aware self-supervised framework for functional connectome extraction from resting-state fMRI
Charalampos Lamprou, Aamna Alshehhi, Leontios J. Hadjileontiadis, Mohamed L. Seghier
https://arxiv.org/abs/2510.02120
FPGA-Based RoCEv2-RDMA Readout Electronics for the CTAO-LST Advanced Camera
F. Marini, M. Bellato, A. Bergnoli, D. Corti, A. Griggio, R. Isocrate, L. Modenese, M. Toffano, C. Arcaro, F. Di Pierro, M. Mariotti, M. Mi, P. Wang
https://arxiv.org/abs/2509.02285
Crosslisted article(s) found for cs.LG. https://arxiv.org/list/cs.LG/new
[1/7]:
- AdaptCache: KV Cache Native Storage Hierarchy for Low-Delay and High-Quality Language Model Serving
Feng, Li, Du, Gu, Liu, Yao, Ray, Shen, Cheng, Ananthanarayanan, Jiang
Comparative Analysis of Ant Colony Optimization and Google OR-Tools for Solving the Open Capacitated Vehicle Routing Problem in Logistics
Assem Omar, Youssef Omar, Marwa Solayman, Hesham Mansour
https://arxiv.org/abs/2509.26216
@… Yeah, I am not opposed to state varying across that level. Across applications, maintained by separate teams or even departments, of course! You don’t want to couple yourself to people who might have drastically different definitions of “success” to you.
But within a single application, serving a single purpose? Only if there’s no better way.…
Ukraine captures Kenyan serving in Russian army, who claims he was tricked into joining: https://benborges.xyz/2025/09/17/ukraine-captures-kenyan-serving-in.html
How Surveillance Firms Use ‘Democracy’ As a Cover for Serving ICE and Trump https://www.404media.co/how-surveillance-firms-use-democracy-as-a-cover-for-serving-ice-and-trump/
Another Trump-Appointed U.S. Attorney Found to be Serving Unlawfully, Federal Judge Rules (Yunior Rivas/Democracy Docket)
https://www.democracydocket.com/news-alerts/another-trump-appointed-u-s-attorney-found-to-be-serving-unlawfully-federal-judge-rules/
http://www.memeorandum.com/251028/p156#a251028p156
Virginia Tech coaching search: Bruce Arians, Super Bowl winner and former Hokies QB, serving as consultant
https://www.cbssports.com/college-foo…
I got Epsom Salt. That’s just uncanny. https://beige.party/@RickiTarr/115124087333264067
New #Pixelfed Post:
#Fediverse
The most up-to-date light curve of interstellar comet #ATLAS - https://x.com/AsteroidEnergy/status/1983707109045989551 - with sparse data points all the way to perihelion, https://bsky.app/profile/kwalsh4a.bsky.social/post/3m4bwiczncs2z from PUNCH serving the latest. So as seen from Earth the brightness is about 9.5 mag. - and should still be 10.something when the comet can be observed from the ground again, from about 10 November onwards.
So, we had we had the Chez Dallman Pecan Chicken Bok Choy again. I wanted to take my wife's picture, but she protested and said she never looks good under the glaring kitchen light. I pushed the settings button in the camera app and said wryly, "Ah, there it is: turning on 'Make My Wife Beautiful' mode!" And I pushed at the screen.
My wife cracked up. (I did too.) It worked. She likes the photo, or at least she doesn't hate it. I can't post it here, though…
VeriLLM: A Lightweight Framework for Publicly Verifiable Decentralized Inference
Ke Wang, Felix Qu, Libin Xia, Zishuo Zhao, Chris Tong, Lynn Ai, Eric Yang
https://arxiv.org/abs/2509.24257
Evaluating SAP Joule for Code Generation
Joshua Heisler, Johannes Reisinger, Andreas Fischer
https://arxiv.org/abs/2509.24828 https://arxiv.org/pdf/2509.24…
A Predictive and Synergistic Two-Layer Scheduling Framework for LLM Serving
Yue Zhang, Yuansheng Chen, Xuan Mo, Alex Xi, Jialun Li, WeiGang Wu
https://arxiv.org/abs/2509.23384 h…
Report finds dangerous mercury levels, highlights mislabeling in shark meat sold in EU https://news.mongabay.com/2025/10/report-finds-dangerous-mercury-levels-highlights-mislabeling-i…
Chatbots Are Pushing Sanctioned Russian Propaganda
https://www.wired.com/story/chatbots-are-pushing-sanctioned-russian-propaganda/
Resource-efficient universal photonic processor based on time-multiplexed hybrid architectures
Jonas Lammers, Laura Ares, Federico Pegoraro, Philip Held, Benjamin Brecht, Jan Sperling, Christine Silberhorn
https://arxiv.org/abs/2509.22521
Fault Injection in On-Chip Interconnects: A Comparative Study of Wishbone, AXI-Lite, and AXI
Hongwei Zhao, Vianney Lapotre, Guy Gogniat
https://arxiv.org/abs/2509.24929 https://…
After two decades in Congress,
Darrell Issa’s career is all about serving the ultra-wealthy like himself
– and serving Donald Trump. It’s time to send him packing and take our country back.
On the City Council,
Marni von Wilpert
flipped San Diego’s reddest seat blue.
Now, she’s ready to do it again – in Congress.
https://www.
Rethinking Caching for LLM Serving Systems: Beyond Traditional Heuristics
Jungwoo Kim, Minsang Kim, Jaeheon Lee, Chanwoo Moon, Heejin Kim, Taeho Hwang, Woosuk Chung, Yeseong Kim, Sungjin Lee
https://arxiv.org/abs/2508.18736
SparseServe: Unlocking Parallelism for Dynamic Sparse Attention in Long-Context LLM Serving
Qihui Zhou, Peiqi Yin, Pengfei Zuo, James Cheng
https://arxiv.org/abs/2509.24626 http…
Drop Site uncovered new information about individuals, donor networks, and businesses helping Canary Mission, a pro-Israel organization serving the U.S.'s deportation and repression efforts.
https://www.dropsitenews.com/p/canary-miss
The application is currently not serving requests at this endpoint. It may not have been started or is still starting.
Zero-Waiting Load Balancing with Heterogeneous Servers in Heavy Traffic
Xin Liu, Lei Ying
https://arxiv.org/abs/2509.23918 https://arxiv.org/pdf/2509.23918…
As Trump Defunds Infrastructure, Water Systems Serving Millions Face Flood Risk - WhoWhatWhy
https://whowhatwhy.org/science/environment/as-trump-defunds-infrastructure-water-systems-serving-millions-face-flood-risk/
Information Transmission in Quorum Sensing for Gut Microbiome
O. Tansel Baydas, Efe Yatgin, Ozgur B. Akan
https://arxiv.org/abs/2509.25057 https://arxiv.or…
happens to the best of us
https://blog.cloudflare.com/deep-dive-into-cloudflares-sept-12-dashboard-and-api-outage/
I've been thinking a bit about historians' use of theory. I've decided it's often similar to a comment a cafeteria worker made to me decades ago. They were serving chili dogs and I asked if I could just have a bowl of chili instead. "Oh no, dear," she said, "this is chili dog chili, it's not for eating." I've long suspected that a good chunk of my field thinks of theory along these lines. We put it on the menu and let our students consume it, even …
Measuring the environmental impact of delivering AI at Google Scale
Cooper Elsworth, Keguo Huang, David Patterson, Ian Schneider, Robert Sedivy, Savannah Goodman, Ben Townsend, Parthasarathy Ranganathan, Jeff Dean, Amin Vahdat, Ben Gomes, James Manyika
https://arxiv.org/abs/2508.15734
DRIFT-Net: A Spectral--Coupled Neural Operator for PDEs Learning
Jiayi Li, Flora D. Salim
https://arxiv.org/abs/2509.24868 https://arxiv.org/pdf/2509.24868…
GSM8K-V: Can Vision Language Models Solve Grade School Math Word Problems in Visual Contexts
Fan Yuan, Yuchen Yan, Yifan Jiang, Haoran Zhao, Tao Feng, Jinyan Chen, Yanwei Lou, Wenqi Zhang, Yongliang Shen, Weiming Lu, Jun Xiao, Yueting Zhuang
https://arxiv.org/abs/2509.25160
Particle Acceleration and Transport in the Large-scale Current Sheet under an Erupting Magnetic Flux Rope
Hao Wu, Yang Guo, Rony Keppens, Chun Xia, Yang Su, Xiangliang Kong, Mingde Ding
https://arxiv.org/abs/2509.22265
I appreciate the premise of reducing red meat in one's diet... but if you expect me to drop fish or seafood, you can go to hell.
✅ The Self-Importance of Luxury Dining: Eleven Madison Park is serving meat again—a sign of American tastes, and of fine-dining hubris - The Atlantic
https://archive.ph/BxO7Z
Nova: Real-Time Agentic Vision-Language Model Serving with Adaptive Cross-Stage Parallelization
Yuhang Xu, Shengzhong Liu, Dong Zhang, Bingheng Yan, Fan Wu, Guihai Chen
https://arxiv.org/abs/2509.21301
LLM Serving Optimization with Variable Prefill and Decode Lengths
Meixuan Wang, Yinyu Ye, Zijie Zhou
https://arxiv.org/abs/2508.06133 https://arxiv.org/pdf…
From https://bsky.app/profile/did:plc:cysfyu7ook326epmpvun2qtb/post/3lzs7noi6oc23 —
> Prof. Lee says that the problem for Canada Post is the smartphone because people send texts, not letters. Fine, but don't gut it. I say, make Canada Post a pub…
A federal judge ruled Thursday that Trump’s former lawyer,
Alina Habba, has been unlawfully serving as the the top federal prosecutor in New Jersey since last month.
U.S. District Judge Matthew Brann held that Habba’s term as the interim U.S. attorney ended in July,
and the Trump administration’s “novel series of legal and personnel moves” to keep her in the role
-- without getting confirmation from the U.S. Senate
-- didn’t follow procedures required by federal l…
MPFormer: Adaptive Framework for Industrial Multi-Task Personalized Sequential Retriever
Yijia Sun, Shanshan Huang, Linxiao Che, Haitao Lu, Qiang Luo, Kun Gai, Guorui Zhou
https://arxiv.org/abs/2508.20400
Very high-energy gamma-ray and neutrino emission from hadronic interaction in compact binary millisecond pulsars
Vittoria Vecchiotti, Manuel Linares
https://arxiv.org/abs/2508.20952
FLAME: A Serving System Optimized for Large-Scale Generative Recommendation with Efficiency
Xianwen Guo, Bin Huang, Xiaomeng Wu, Guanlin Wu, Fangjian Li, Shijia Wang, Qiang Xiao, Chuanjiang Luo, Yong Li
https://arxiv.org/abs/2509.22681
GreenLLM: SLO-Aware Dynamic Frequency Scaling for Energy-Efficient LLM Serving
Qunyou Liu, Darong Huang, Marina Zapater, David Atienza
https://arxiv.org/abs/2508.16449 https://
Intra-request branch orchestration for efficient LLM reasoning
Weifan Jiang, Rana Shahout, Yilun Du, Michael Mitzenmacher, Minlan Yu
https://arxiv.org/abs/2509.24957 https://
RServe: Overlapping Encoding and Prefill for Efficient LMM Inference
Tianyu Guo, Tianming Xu, Xianjie Chen, Junru Chen, Nong Xiao, Xianwei Zhang
https://arxiv.org/abs/2509.24381
"In 2024, the group focused on recruiting and retention, women serving on submarines and mothers reintegrating to military life after pregnancy"
Pentagon shutters women’s advisory group
https://taskandpurpose.com/news/defense-womens-advisory-group/
National Federation of Community Broadcasters gets a $1.25M MacArthur grant; NFCB represents ~200 stations mostly serving rural and underrepresented communities (Austin Fuller/Current)
https://current.org/2025/10/national-federation-of-c…
Cycle is All You Need: More Is Different
Xin Li
https://arxiv.org/abs/2509.21340 https://arxiv.org/pdf/2509.21340…
Embodied AI: From LLMs to World Models
Tongtong Feng, Xin Wang, Yu-Gang Jiang, Wenwu Zhu
https://arxiv.org/abs/2509.20021 https://arxiv.org/pdf/2509.20021
DHS moves to bar aid groups from serving undocumented immigrants (Brianna Sacks/Washington Post)
https://www.washingtonpost.com/weather/2025/08/27/dhs-fema-undocumented-immigrants-aid-groups-grants/?pwapi_token=eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJyZWFzb24iOiJnaWZ0IiwibmJmIjoxNzU2MjY3MjAwLCJpc3MiOiJzdWJzY3JpcHRpb25zIiwiZXhwIjoxNzU3NjQ5NTk5LCJpYXQiOjE3NTYyNjcyMDAsImp0aSI6ImM2ZDBlZDIyLTE5NmUtNGQzZi04ZmNiLWFkZDFjZjRlYWFkZCIsInVybCI6Imh0dHBzOi8vd3d3Lndhc2hpbmd0b25wb3N0LmNvbS93ZWF0aGVyLzIwMjUvMDgvMjcvZGhzLWZlbWEtdW5kb2N1bWVudGVkLWltbWlncmFudHMtYWlkLWdyb3Vwcy1ncmFudHMvIn0.jLBr6sPapG7mcyHKjeTPUwshLfB7UHhyklQxwk-f1xo&itid=gfta
http://www.memeorandum.com/250827/p96#a250827p96
Dario Amodei addresses "inaccurate claims" about Anthropic's policy stances after David Sacks said the "real issue" is "Anthropic's agenda to backdoor Woke AI" (Ashley Capoot/CNBC)
https://www.cnbc.com/2025/10/21/anthropic-ceo-trump-s…
Grounded AI for Code Review: Resource-Efficient Large-Model Serving in Enterprise Pipelines
Sayan Mandal, Hua Jiang
https://arxiv.org/abs/2510.10290 https://
The official Xubuntu website was compromised over the weekend (18/19 October 2025) briefly serving up Windows malware to users trying to download the distro.
https://www.omgubuntu.co.uk/2025/10/xubuntu-website-malware-hack
Optimal quantum simulation of linear non-unitary dynamics
Guang Hao Low, Rolando D. Somma
https://arxiv.org/abs/2508.19238 https://arxiv.org/pdf/2508.19238…
ExpertWeave: Efficiently Serving Expert-Specialized Fine-Tuned Adapters at Scale
Ge Shi, Hanieh Sadri, Qian Wang, Yu Zhang, Ying Xiong, Yong Zhang, Zhenan Fan
https://arxiv.org/abs/2508.17624
Systematic Characterization of LLM Quantization: A Performance, Energy, and Quality Perspective
Tianyao Shi, Yi Ding
https://arxiv.org/abs/2508.16712 https://
Structure-aware Hypergraph Transformer for Diagnosis Prediction in Electronic Health Records
Haiyan Wang, Ye Yuan
https://arxiv.org/abs/2508.20500 https://…
Parallax: Efficient LLM Inference Service over Decentralized Environment
Chris Tong, Youhe Jiang, Gufeng Chen, Tianyi Zhao, Sibian Lu, Wenjie Qu, Eric Yang, Lynn Ai, Binhang Yuan
https://arxiv.org/abs/2509.26182
Alibaba Cloud details a GPU pooling system that it claims reduced the number of Nvidia H20 required by 82% when serving dozens of LLMs of up to 72B parameters (Vincent Chow/South China Morning Post)
https://www.scmp.com/business/article/3329
Justice Department says U.S. won't defend grants for Hispanic-serving colleges, calling them unconstitutional - CBS News
https://www.cbsnews.com/news/justice-department-hispanic-colleges-grants-unconstitutional/
Synthesizing Artifact Dataset for Pixel-level Detection
Dennis Menn, Feng Liang, Diana Marculescu
https://arxiv.org/abs/2509.19589 https://arxiv.org/pdf/25…
Embodied AI: From LLMs to World Models
Tongtong Feng, Xin Wang, Yu-Gang Jiang, Wenwu Zhu
https://arxiv.org/abs/2509.20021 https://arxiv.org/pdf/2509.20021
HyperFlexis: Joint Design of Algorithms and Systems for Multi-SLO Serving and Fast Scaling
Zahra Yousefijamarani, Xinglu Wang, Qian Wang, Morgan Lindsay Heisler, Taha Shabani, Niloofar Gholipour, Parham Yassini, Hong Chang, Kan Chen, Qiantao Zhang, Xiaolong Bai, Jiannan Wang, Ying Xiong, Yong Zhang, Zhenan Fan
https://arxiv.org/abs/2508.15…
City Matters, a free monthly newspaper serving the City of London since 2016, enters voluntary liquidation, citing rising print costs and declining ad revenue (Alice Brooker/Press Gazette)
https://pressgazette.co.uk/publishers/reg…
A senior CIA officer who oversaw Russia analysis
has been stripped of her security clearance,
part of a sweeping removal of
37 serving and former officials accused of "betray[ing] their oath to the Constitution," the Economist reported on Aug. 21.
The officer, who served as the CIA’s top Russia and Eurasia analyst during the 2016 election
and helped produce the report detailing Moscow’s interference on behalf of Donald Trump,
was among the most s…
Expert-as-a-Service: Towards Efficient, Scalable, and Robust Large-scale MoE Serving
Ziming Liu, Boyu Tian, Guoteng Wang, Zhen Jiang, Peng Sun, Zhenhua Han, Tian Tang, Xiaohe Hu, Yanmin Jia, Yan Zhang, He Liu, Mingjun Zhang, Yiqi Zhang, Qiaoling Chen, Shenggan Cheng, Mingyu Gao, Yang You, Siyuan Feng
https://arxiv.org/abs/2509.17863…
Justice Dept. declines to defend grants for Hispanic-serving colleges, calling them unconstitutional (Associated Press)
https://apnews.com/article/hispanic-colleges-trump-a795fc966590681f41410c2b3e268ac0
http://www.memeorandum.com/250822/p117#a250822p117
Equinox: Holistic Fair Scheduling in Serving Large Language Models
Zhixiang Wei, James Yen, Jingyi Chen, Ziyang Zhang, Zhibai Huang, Chen Chen, Xingzi Yu, Yicheng Gu, Chenggang Wu, Yun Wang, Mingyuan Xia, Jie Wu, Hao Wang, Zhengwei Qi
https://arxiv.org/abs/2508.16646
TinyML Towards Industry 4.0: Resource-Efficient Process Monitoring of a Milling Machine
Tim Langer, Matthias Widra, Volkhard Beyer
https://arxiv.org/abs/2508.16553 https://
Strata: Hierarchical Context Caching for Long Context Language Model Serving
Zhiqiang Xie, Ziyi Xu, Mark Zhao, Yuwei An, Vikram Sharma Mailthody, Scott Mahlke, Michael Garland, Christos Kozyrakis
https://arxiv.org/abs/2508.18572
TokenLake: A Unified Segment-level Prefix Cache Pool for Fine-grained Elastic Long-Context LLM Serving
Bingyang Wu, Zili Zhang, Yinmin Zhong, Guanzhe Huang, Yibo Zhu, Xuanzhe Liu, Xin Jin
https://arxiv.org/abs/2508.17219
Predictable LLM Serving on GPU Clusters
Erfan Darzi, Shreeanant Bharadwaj, Sree Bhargavi Balija
https://arxiv.org/abs/2508.20274 https://arxiv.org/pdf/2508…
Disaggregated Prefill and Decoding Inference System for Large Language Model Serving on Multi-Vendor GPUs
Xing Chen, Rong Shi, Lu Zhao, Lingbin Wang, Xiao Jin, Yueqiang Chen, Hongfeng Sun
https://arxiv.org/abs/2509.17542
ShadowServe: Interference-Free KV Cache Fetching for Distributed Prefix Caching
Xingyu Xiang, Raj Joshi, Yuhan Liu, Jiayi Yao, Chenxingyu Zhao, Junchen Jiang, Yang Zhou, Eddie Kohler, Minlan Yu
https://arxiv.org/abs/2509.16857
Taming the Chaos: Coordinated Autoscaling for Heterogeneous and Disaggregated LLM Inference
Rongzhi Li, Ruogu Du, Zefang Chu, Sida Zhao, Chunlei Han, Zuocheng Shi, Yiwen Shao, Huanle Han, Long Huang, Zherui Liu, Shufan Liu
https://arxiv.org/abs/2508.19559
FlexPipe: Adapting Dynamic LLM Serving Through Inflight Pipeline Refactoring in Fragmented Serverless Clusters
Yanying Lin, Shijie Peng, Chengzhi Lu, Chengzhong Xu, Kejiang Ye
https://arxiv.org/abs/2510.11938
VoltanaLLM: Feedback-Driven Frequency Control and State-Space Routing for Energy-Efficient LLM Serving
Jiahuan Yu (University of Illinois Urbana-Champaign), Aryan Taneja (University of Illinois Urbana-Champaign), Junfeng Lin (Tsinghua University), Minjia Zhang (University of Illinois Urbana-Champaign)
https://arxiv.org/abs/2509.04827
Hetis: Serving LLMs in Heterogeneous GPU Clusters with Fine-grained and Dynamic Parallelism
Zizhao Mo, Jianxiong Liao, Huanle Xu, Zhi Zhou, Chengzhong Xu
https://arxiv.org/abs/2509.08309
Block: Balancing Load in LLM Serving with Context, Knowledge and Predictive Scheduling
Wei Da, Evangelia Kalyvianaki
https://arxiv.org/abs/2508.03611 https://
Kairos: Low-latency Multi-Agent Serving with Shared LLMs and Excessive Loads in the Public Cloud
Jinyuan Chen, Jiuchen Shi, Quan Chen, Minyi Guo
https://arxiv.org/abs/2508.06948