Disable kv cache storage in vllm

I have an Language model and now i just need it to output 1 token, so kv cache is actually not used for my scene, is there any methods to avoid cache kv for inference in vllm?

No, vLLM always computes and stores the KV cache, even when max_tokens=1. There is currently no official configuration or flag in vLLM to disable KV cache for inference, and attempts to set use_cache=False (as in Hugging Face) are not supported in vLLM. Disabling or bypassing KV cache would require custom code modifications or monkey-patching, which is not recommended for production use and is not natively supported by vLLM’s APIs or configuration options. See discussion and code references in this issue and this issue.

Would you like more detail or code-level guidance on possible workarounds?

Sources: