How to run a model use heterogeneous GPUs

I have different GPUs,like

Two Hygon DCUs:

hy-smi

================================= System Management Interface ==================================
================================================================================================
HCU     Temp     AvgPwr     Perf     PwrCap     VRAM%      HCU%      Mode     
0       58.0C    135.0W     auto     350.0W     85%        0.0%      Normal   
1       48.0C    88.0W      auto     350.0W     0%         0.0%      Normal   
================================================================================================
======================================== End of SMI Log ========================================

Two Nvidia GPUs:

NVIDIA A800 80GB PCIe
NVIDIA A100 80GB PCIe

I’m not sure how to run a model use these heterogeneous GPUs?

vLLM does not natively support running a single model across heterogeneous GPUs (different vendors or architectures, e.g., mixing Hygon DCUs and NVIDIA GPUs) in one inference instance. vLLM’s distributed inference (tensor/pipeline parallelism) requires all participating GPUs to be of the same type and architecture for correct operation and performance, as per official documentation and parallelism guidelines.

You can, however, run separate vLLM instances on each GPU type (e.g., one instance on Hygon DCUs, another on NVIDIA GPUs) and manage them independently, but you cannot shard a single model across both Hygon and NVIDIA GPUs in one vLLM process. For multi-GPU parallelism, use only identical GPUs within each vLLM instance, and select which GPUs to use via the CUDA_VISIBLE_DEVICES environment variable for NVIDIA cards or the equivalent for Hygon DCUs, as described in this issue. Would you like more detail on how to set up vLLM for each GPU type separately?

Sources: