Latest KV-Cache topics

Topic	Replies	Views	Activity
About the KV-Cache category	0	61	March 20, 2025
NVFP4 Support In Attention	1	498	March 16, 2026
How to get kv cache value from vllm	5	279	January 19, 2026
Exposing KV cache for recomposition / reuse beyond prefix caching?	1	120	January 13, 2026
Understanding vllm kv cache	5	1371	December 1, 2025
vLLM extremely slow / no response with max_model_len=8192 and multi-GPU tensor parallel	1	991	October 26, 2025
Question on Advanced vLLM Use Case: Distributed Prefix Caching for a CAG Evaluation Framework	1	161	October 15, 2025
How to do KV cache transfer between a CPU instance and a GPU instance?	1	272	July 31, 2025
Something weired about the reading procedure of q_vecs in page attention kernel	3	27	June 9, 2025
Computation time remain consistent across chunks in chunked-prefill despite linearly growing attention complexity?	1	75	June 2, 2025
APC Slowdown with block-size=1	1	79	May 26, 2025
VUA - library code for LLM inference engines for external storage of KV caches	1	86	May 13, 2025
Support for (sparse) key value caching	16	589	May 3, 2025
How to crop kv_caches?	0	72	April 13, 2025
Why is the prefix cache hit rate constantly increasing	3	1190	March 27, 2025
Prefix Cache control	1	404	March 24, 2025
Avoiding hash collisions in prefix cache	7	345	March 24, 2025