Will CPU Offload be supported in V1?

–cpu-offload-gb is not supported by the V1 Engine. Will this feature will support later or Deprecated ?

Previously the --cpu-offload-gb is only used for request preemption purpose, not for expanding the KV cache size in GPU. If you are looking for CPU offloading, stay tuned to [V1][Core] Support offloading KV cache to CPU. by mengzhu28 · Pull Request #13377 · vllm-project/vllm · GitHub, we are also working on v1 support in LMCache (GitHub - LMCache/LMCache: Redis for LLMs).

3 Likes

In the end i think it needs to be supported, we need to reimplement cpu offloading with pytorch dispatch mode.

1 Like

please keep an eye on [V1] Fully Transparent Implementation of CPU Offloading by youkaichao · Pull Request #15354 · vllm-project/vllm · GitHub !