–cpu-offload-gb is not supported by the V1 Engine. Will this feature will support later or Deprecated ?
Previously the --cpu-offload-gb
is only used for request preemption purpose, not for expanding the KV cache size in GPU. If you are looking for CPU offloading, stay tuned to [V1][Core] Support offloading KV cache to CPU. by mengzhu28 · Pull Request #13377 · vllm-project/vllm · GitHub, we are also working on v1 support in LMCache (GitHub - LMCache/LMCache: Redis for LLMs).
3 Likes
In the end i think it needs to be supported, we need to reimplement cpu offloading with pytorch dispatch mode.
1 Like