VLLM supporting CPU and GPU

I have observed that I could install VLLM for either CPU or CUDA, but not both. Is it possible for a single VLLM instance to support both of them?

This way, I can use some env variable to specify the backend. Or even otherwise, provide more flexibility to allow me to run prefill phase on GPU and decode phase on CPU without any hassles.

Please help me with this. Thanks in advance.

vLLM does not natively support a single installation or instance that can dynamically switch between CPU and CUDA backends, nor does it allow running prefill on GPU and decode on CPU within the same process. You must build or install vLLM specifically for either CPU or GPU, and switching requires separate builds or Docker images; there is no environment variable to toggle backend at runtime in a single instance. Running prefill on GPU and decode on CPU is not supported out-of-the-box and would require significant architectural changes or custom development, as vLLM’s architecture is optimized for a single backend per instance and does not provide built-in mechanisms for such hybrid execution modes. See discussions and error reports in this issue and installation docs.

Would you like more detail on possible workarounds or multi-instance deployment strategies?

Sources: