Mixed GPU support?

Is it possible to run inference across GPUs from multiple vendors? We have GPUs from multiple different vendors, but all with 32GB of VRAM. Being able to pool them together would be ideal for running larger models.

vLLM does not support pooling GPUs from multiple vendors (e.g., mixing NVIDIA and AMD) for a single inference run. All GPUs used for tensor or pipeline parallelism must be of the same vendor and typically require a homogeneous environment for distributed inference. There is no documented method or configuration in vLLM to enable cross-vendor GPU pooling for model inference—heterogeneous GPU support is not mentioned or supported in the official documentation or issues.

For multi-node or multi-GPU setups, vLLM requires the same execution environment and compatible hardware across all nodes and devices, as described in the distributed serving documentation. Mixing different GPU vendors is not supported and may lead to incompatibility or runtime errors. Would you like more detail on supported distributed configurations or alternatives for heterogeneous hardware setups?

Sources:

Hint: Mention @RunLLM in the post for followups.