I am using the vllm for an ai chatbot,
there will be many active chat threads connected via websocket , all the chat threads information is stored in the in-memory database, and if any chat thread is disconnected, i will delete the data corresponding to it in the in-memory database,
Since the vllm caches the prefix, is there functionality like where i can delete the prefix caching for a given chat thread,
From the developer experience POV, When the chat socket is disconnected, i should be able to delete the prefix caches of that chat,
How can i implement this with the current vllm, or is this a feature in progress
vLLM’s prefix caching is fully automated so you don’t need to delete it explicitly. The cached blocks will be evicted and allocated to new requests based on LRU eviction policy. You can read the prefix caching design doc for details: Automatic Prefix Caching — vLLM