When vLLM disables Prefix Caching, is it confirmed that there is no cross-request KV cache matching logic? Is the KV cache lifecycle limited to the current request when Prefix Caching is disabled?
Yes, when prefix caching is disabled in vLLM, there is no cross-request KV cache matching logic; the KV cache is only used within the lifecycle of the current request and is not shared or reused across different requests. This is confirmed in the design documentation and discussions, which state that prefix caching is the mechanism that enables KV cache sharing across requests, and disabling it restricts the cache to per-request usage only. Without prefix caching, once a request is finished, its KV cache is not available for future requests, and no matching or reuse occurs between requests[^1][^2][^3].
Would you like more detail on how the KV cache is managed internally or how to configure prefix caching?
Sources: