Will CPU Offload be supported in V1?

artemus717 · March 21, 2025, 8:42am

–cpu-offload-gb is not supported by the V1 Engine. Will this feature will support later or Deprecated ?

Kuntai · March 21, 2025, 5:27pm

Previously the --cpu-offload-gb is only used for request preemption purpose, not for expanding the KV cache size in GPU. If you are looking for CPU offloading, stay tuned to [V1][Core] Support offloading KV cache to CPU. by mengzhu28 · Pull Request #13377 · vllm-project/vllm · GitHub, we are also working on v1 support in LMCache (GitHub - LMCache/LMCache: Redis for LLMs).

youkaichao · March 22, 2025, 1:57am

In the end i think it needs to be supported, we need to reimplement cpu offloading with pytorch dispatch mode.

youkaichao · March 24, 2025, 3:56pm

Topic		Replies	Views
The new V1 way to ~--cpu-offload-gb V1 Feedback	5	3151	April 13, 2025
Possible to offload KV cache to DRAM or nvme? General	5	958	October 24, 2025
vLLM v1 forces me to pre-allocate a huge, non-reclaimable GPU KV cache for long contexts, and none of the current offload or quantization options solve the resulting VRAM bloat without crippling speed. V1 Feedback	5	554	September 8, 2025
How to do KV cache transfer between a CPU instance and a GPU instance? KV-Cache	1	217	July 31, 2025
Nvidia T4 --cpu-offload-gb error General	5	588	April 19, 2025