Prefix Cache control

avinashreddydev · March 24, 2025, 8:54pm

I am using the vllm for an ai chatbot,
there will be many active chat threads connected via websocket , all the chat threads information is stored in the in-memory database, and if any chat thread is disconnected, i will delete the data corresponding to it in the in-memory database,
Since the vllm caches the prefix, is there functionality like where i can delete the prefix caching for a given chat thread,
From the developer experience POV, When the chat socket is disconnected, i should be able to delete the prefix caches of that chat,
How can i implement this with the current vllm, or is this a feature in progress

comaniac · March 24, 2025, 10:15pm

vLLM’s prefix caching is fully automated so you don’t need to delete it explicitly. The cached blocks will be evicted and allocated to new requests based on LRU eviction policy. You can read the prefix caching design doc for details: Automatic Prefix Caching — vLLM

Topic		Replies	Views
Kv cache when disable Prefix Caching General	1	751	May 22, 2025
How should I set kv cache in vllm? General	3	266	November 18, 2025
Should vLLM consider prefix caching when chunked prefill is enabled? General	1	433	June 24, 2025
LLM memory caching General	7	39	May 7, 2026
How to adjust prefix-caching timeout General	2	449	September 25, 2025

Prefix Cache control

Related topics