Can’t use ampere features

I have a 2080ti in the same system as several 3090s. When launching Vllm it complains that ampere features can’t be enabled because of the 2080ti.

However the 2080ti isn’t being called with CUDA visible devices…

Is there a workaround or is the only option to use vms with GPU pass through?

The issue is caused by CUDA device ordering: by default, CUDA may assign device 0 to the GPU with the highest compute capability, not by PCI bus ID, which can confuse vLLM’s device selection logic. Even if you set CUDA_VISIBLE_DEVICES, the device order may not match what vLLM expects, leading to errors about unsupported features on the 2080ti. The recommended workaround is to set both CUDA_VISIBLE_DEVICES and CUDA_DEVICE_ORDER=PCI_BUS_ID before launching vLLM. This ensures device ordering matches what vLLM and PyTorch expect, and the correct GPUs are used for inference. VM GPU passthrough is not required for this workaround to work.

Example:

export CUDA_DEVICE_ORDER=PCI_BUS_ID
export CUDA_VISIBLE_DEVICES=1,2  # (replace with your 3090 device IDs)
python -m vllm.entrypoints.openai.api_server ...

For more details, see the discussion in vllm issue #7472.

Would you like a step-by-step explanation or more troubleshooting tips?

Sources: