Request preemption option

anhphantq · September 28, 2025, 10:00am

Hi everyone,

After skimming through the recent code. I think preemption with swapping option is not used anymore. I am thinking about why don’t why use a KV connectors to offload the KV cache blocks in case a request is preempted (instead of recomputing we save it into LMCache lower storage layer). From the document, they said that re-computing is faster than swapping, but I think there should be some benchmark saying this?

Could someone confirm this is true or not for me?

Thank you so much!

RunLLM · September 28, 2025, 10:00am

In vLLM V1, preemption by recomputation is the default and preferred strategy because it has lower overhead than swapping (offloading KV cache to CPU or other storage). Swapping is only used for multi-sequence requests (e.g., beam search), as recomputation is not supported there. The documentation and code confirm that recomputation is generally more efficient, and swapping is deprecated for most cases. This is based on benchmarks and design discussions showing that recomputation overhead is never higher than 20% of swapping’s latency, and for small block sizes, recomputation is much faster due to the high cost of data movement in swapping. See the discussion and benchmarks in this issue and the optimization guide.

If you want to use a KV connector (like LMCache) for offloading, it is possible, but the current vLLM design intentionally favors recomputation for performance reasons. Swapping/offloading is being re-evaluated for future extensibility, but for now, recomputation is the recommended and benchmarked approach for preemption in V1.

Would you like more detail on the benchmarks or implementation?

Sources:

Hint: Mention @RunLLM in the post for followups.

Topic		Replies	Views
Should request priority also influence KV cache retention? General	1	48	July 9, 2026
Possible to offload KV cache to DRAM or nvme? General	5	2124	October 24, 2025
Vllm-0.18.0 kv cache使用率从100%掉到0% General	3	145	April 30, 2026
Exposing KV cache for recomposition / reuse beyond prefix caching? KV-Cache	1	221	January 13, 2026
Understanding vllm kv cache KV-Cache	5	2145	December 1, 2025

Request preemption option

Related topics