How to run a model use heterogeneous GPUs

longfei-zhang · August 12, 2025, 9:49am

I have different GPUs，like

Two Hygon DCUs:

hy-smi

================================= System Management Interface ==================================
================================================================================================
HCU     Temp     AvgPwr     Perf     PwrCap     VRAM%      HCU%      Mode     
0       58.0C    135.0W     auto     350.0W     85%        0.0%      Normal   
1       48.0C    88.0W      auto     350.0W     0%         0.0%      Normal   
================================================================================================
======================================== End of SMI Log ========================================

Two Nvidia GPUs:

NVIDIA A800 80GB PCIe

NVIDIA A100 80GB PCIe

I’m not sure how to run a model use these heterogeneous GPUs?

RunLLM · August 12, 2025, 9:50am

vLLM does not natively support running a single model across heterogeneous GPUs (different vendors or architectures, e.g., mixing Hygon DCUs and NVIDIA GPUs) in one inference instance. vLLM’s distributed inference (tensor/pipeline parallelism) requires all participating GPUs to be of the same type and architecture for correct operation and performance, as per official documentation and parallelism guidelines.

You can, however, run separate vLLM instances on each GPU type (e.g., one instance on Hygon DCUs, another on NVIDIA GPUs) and manage them independently, but you cannot shard a single model across both Hygon and NVIDIA GPUs in one vLLM process. For multi-GPU parallelism, use only identical GPUs within each vLLM instance, and select which GPUs to use via the CUDA_VISIBLE_DEVICES environment variable for NVIDIA cards or the equivalent for Hygon DCUs, as described in this issue. Would you like more detail on how to set up vLLM for each GPU type separately?

Sources:

Topic		Replies	Views
Mixed GPU support? Hardware Support	1	321	March 31, 2026
Setting two LLMs on different GPUs in one offline inference script General	1	490	June 6, 2025
Run vLLM on two diffrent GPU General	1	1003	May 21, 2025
Does vLLM support multiple model_executor? Scheduling	1	359	April 28, 2025
2 vllm docker on same host General	3	280	September 16, 2025

How to run a model use heterogeneous GPUs

Related topics