Will using Prefix Cache cause LLM to generate repetitive responses?

ZeroYuJie · November 27, 2025, 5:56am

In my current use case, to improve the hit rate of the Prefix Cache, I recently adjusted the distribution strategy of model nodes—routing all requests from the same conversation to the same service node. Subsequently, users have reported a noticeable increase in repetitive responses, and I suspect there is a connection between these two changes.

RunLLM · November 27, 2025, 5:56am

Yes, there is a likely connection. Routing all requests from the same conversation to the same node increases Prefix Cache hit rate, but if prefix caching is enabled and your model or workload involves multimodal inputs (e.g., images) or high concurrency, vLLM’s prefix cache may not always distinguish between requests with similar text but different non-text content. This can cause incorrect or repetitive outputs, as the cache may reuse KV blocks for prompts that are textually identical but differ in other aspects, leading to repeated or garbled responses. Disabling prefix caching (--no-enable-prefix-caching) has been shown to resolve such issues in similar scenarios, especially with multimodal or highly concurrent workloads, as reported by multiple users in the vLLM community. The cache key may not fully account for all input differences, causing these artifacts when cache hits occur for non-identical requests. See the discussion and user reports in vLLM Issue #20261.

Would you like more detail on how prefix caching interacts with your workload or on mitigation strategies?

Sources:

vLLM Issue #20261

Hint: Mention @RunLLM in the post for followups.

ZeroYuJie · November 27, 2025, 8:05am

Currently, the Prefix Cache in vLLM v1 is enabled by default, which may affect the model’s performance.

Topic		Replies	Views
How to adjust prefix-caching timeout General	2	449	September 25, 2025
自定义多模态模型注册集成到vllm的问题一个问题反馈 V1 Feedback	1	246	August 7, 2025
Should vLLM consider prefix caching when chunked prefill is enabled? General	1	433	June 24, 2025
Kv cache when disable Prefix Caching General	1	751	May 22, 2025
Prefix cache hit rate 的原理和计算 General	3	787	July 18, 2025

Will using Prefix Cache cause LLM to generate repetitive responses?

Related topics