Tootfinder

@arXiv_csLG_bot@mastoxiv.page
2025-09-08 10:05:20

KVCompose: Efficient Structured KV Cache Compression with Composite Tokens
Dmitry Akulov, Mohamed Sana, Antonio De Domenico, Tareq Si Salem, Nicola Piovesan, Fadhel Ayed
https://arxiv.org/abs/2509.05165

KVCompose: Efficient Structured KV Cache Compression with Composite Tokens
Large language models (LLMs) rely on key-value (KV) caches for efficient autoregressive decoding; however, cache size grows linearly with context length and model depth, becoming a major bottleneck in long-context inference. Prior KV cache compression methods either enforce rigid heuristics, disrupt tensor layouts with per-attention-head variability, or require specialized compute kernels. We propose a simple, yet effective, KV cache compression framework based on attention-guided, layer-adap…

Tootfinder

Opt-in global Mastodon full text search. Join the index!