How to adjust prefix-caching timeout

heshenghuan · September 25, 2025, 2:54am

We encountered an issue where requests with the same prefix were stably appearing with endless repetition. The cause of the first request’s infinite repetition is unknown, but it was later discovered that other requests with the same prefix (usually the same system prompt) were also stably repeating. Slightly modifying the system prompt of the request immediately eliminated the repetition. We suspect this is related to the prefix-cache feature, but after consulting the relevant documentation, we found that there is only an option to turn it off, and no option to adjust the invalidation time.

RunLLM · September 25, 2025, 2:54am

vLLM’s prefix cache (automatic prefix caching) reuses cached key-value (KV) blocks for identical token prefixes, such as the same system prompt, to speed up inference. However, if a request with a certain prefix leads to an infinite repetition or corrupted output, subsequent requests with the same prefix may also reuse the problematic cached blocks, causing repeated errors. Modifying the system prompt changes the prefix hash, so the cache is not reused and the issue disappears. There is currently no built-in option to set a cache invalidation time (TTL); cache eviction is managed automatically using an LRU (least recently used) policy, and blocks are only evicted when the KV cache is full and new requests require memory. You can only fully disable prefix caching with the --no-enable-prefix-caching flag or reset the cache via the API, but cannot set a custom expiration time for cached prefixes as of now. See details in prefix caching design, cache eviction, and API for cache reset.

Would you like more detail on how to programmatically reset the prefix cache or on the LRU eviction mechanism?

Sources:

Hint: Mention @RunLLM in the post for followups.

heshenghuan · September 25, 2025, 3:04am

@RunLLM how to programmatically reset the prefix cache or on the LRU eviction mechanism?

Topic		Replies	Views
Will using Prefix Cache cause LLM to generate repetitive responses? General	2	88	November 27, 2025
Kv cache when disable Prefix Caching General	1	675	May 22, 2025
Should vLLM consider prefix caching when chunked prefill is enabled? General	1	301	June 24, 2025
Prefix cache hit rate 的原理和计算 General	3	582	July 18, 2025
Prefix Cache control KV-Cache	1	368	March 24, 2025

How to adjust prefix-caching timeout

Related topics