Kv cache when disable Prefix Caching

libin817927 · May 22, 2025, 6:25am

When vLLM disables Prefix Caching, is it confirmed that there is no cross-request KV cache matching logic? Is the KV cache lifecycle limited to the current request when Prefix Caching is disabled?

RunLLM · May 22, 2025, 6:26am

Yes, when prefix caching is disabled in vLLM, there is no cross-request KV cache matching logic; the KV cache is only used within the lifecycle of the current request and is not shared or reused across different requests. This is confirmed in the design documentation and discussions, which state that prefix caching is the mechanism that enables KV cache sharing across requests, and disabling it restricts the cache to per-request usage only. Without prefix caching, once a request is finished, its KV cache is not available for future requests, and no matching or reuse occurs between requests[^1][^2][^3].

Would you like more detail on how the KV cache is managed internally or how to configure prefix caching?

Sources:

Automatic Prefix Caching — vLLM

Topic		Replies	Views
Prefix Cache control KV-Cache	1	159	March 24, 2025
Is there any roadmap to support prefix caching on dram and disk? Disaggregated Prefilling	0	36	April 8, 2025
Should vLLM consider prefix caching when chunked prefill is enabled? General	1	36	June 24, 2025
Support for (sparse) key value caching KV-Cache	16	72	May 3, 2025
Questions about KV Cache Quantization General	1	55	June 30, 2025

Kv cache when disable Prefix Caching

Related topics