Mixed GPU support?

kstine · March 31, 2026, 6:22pm

Is it possible to run inference across GPUs from multiple vendors? We have GPUs from multiple different vendors, but all with 32GB of VRAM. Being able to pool them together would be ideal for running larger models.

RunLLM · March 31, 2026, 6:22pm

vLLM does not support pooling GPUs from multiple vendors (e.g., mixing NVIDIA and AMD) for a single inference run. All GPUs used for tensor or pipeline parallelism must be of the same vendor and typically require a homogeneous environment for distributed inference. There is no documented method or configuration in vLLM to enable cross-vendor GPU pooling for model inference—heterogeneous GPU support is not mentioned or supported in the official documentation or issues.

For multi-node or multi-GPU setups, vLLM requires the same execution environment and compatible hardware across all nodes and devices, as described in the distributed serving documentation. Mixing different GPU vendors is not supported and may lead to incompatibility or runtime errors. Would you like more detail on supported distributed configurations or alternatives for heterogeneous hardware setups?

Sources:

Hint: Mention @RunLLM in the post for followups.

Topic		Replies	Views
How to run a model use heterogeneous GPUs General	1	637	August 12, 2025
Does vLLM support multiple model_executor? Scheduling	1	322	April 28, 2025
Making best use of varying GPU generations NVIDIA GPU Support	2	761	April 11, 2025
"vllm serve" to use all GPU's of a single-node multi-use General	5	744	August 1, 2025
Run vLLM on two diffrent GPU General	1	835	May 21, 2025

Mixed GPU support?

Related topics