VLLM supporting CPU and GPU

sundar · September 4, 2025, 12:03pm

I have observed that I could install VLLM for either CPU or CUDA, but not both. Is it possible for a single VLLM instance to support both of them?

This way, I can use some env variable to specify the backend. Or even otherwise, provide more flexibility to allow me to run prefill phase on GPU and decode phase on CPU without any hassles.

Please help me with this. Thanks in advance.

RunLLM · September 4, 2025, 12:04pm

vLLM does not natively support a single installation or instance that can dynamically switch between CPU and CUDA backends, nor does it allow running prefill on GPU and decode on CPU within the same process. You must build or install vLLM specifically for either CPU or GPU, and switching requires separate builds or Docker images; there is no environment variable to toggle backend at runtime in a single instance. Running prefill on GPU and decode on CPU is not supported out-of-the-box and would require significant architectural changes or custom development, as vLLM’s architecture is optimized for a single backend per instance and does not provide built-in mechanisms for such hybrid execution modes. See discussions and error reports in this issue and installation docs.

Would you like more detail on possible workarounds or multi-instance deployment strategies?

Sources:

Topic		Replies	Views
Installation issue General	8	434	July 3, 2025
Use editable vllm for different GPU types General	1	100	August 31, 2025
Does vllm support inference or service startup of CPU small model? Hardware Support	3	195	May 30, 2025
Why vLLM uses alot of CPU memory General	1	1460	April 21, 2025
Build VLLM without installing gcc? General	1	689	June 20, 2025

VLLM supporting CPU and GPU

Related topics