Can’t use ampere features

FrenzyBiscuit · June 10, 2025, 10:15pm

I have a 2080ti in the same system as several 3090s. When launching Vllm it complains that ampere features can’t be enabled because of the 2080ti.

However the 2080ti isn’t being called with CUDA visible devices…

Is there a workaround or is the only option to use vms with GPU pass through?

RunLLM · June 10, 2025, 10:15pm

The issue is caused by CUDA device ordering: by default, CUDA may assign device 0 to the GPU with the highest compute capability, not by PCI bus ID, which can confuse vLLM’s device selection logic. Even if you set CUDA_VISIBLE_DEVICES, the device order may not match what vLLM expects, leading to errors about unsupported features on the 2080ti. The recommended workaround is to set both CUDA_VISIBLE_DEVICES and CUDA_DEVICE_ORDER=PCI_BUS_ID before launching vLLM. This ensures device ordering matches what vLLM and PyTorch expect, and the correct GPUs are used for inference. VM GPU passthrough is not required for this workaround to work.

Example:

export CUDA_DEVICE_ORDER=PCI_BUS_ID
export CUDA_VISIBLE_DEVICES=1,2  # (replace with your 3090 device IDs)
python -m vllm.entrypoints.openai.api_server ...

For more details, see the discussion in vllm issue #7472.

Would you like a step-by-step explanation or more troubleshooting tips?

Sources:

vllm issue #7472

Topic		Replies	Views
How to setup amd gpu as default in dual stack gpu? AMD GPU Support	10	213	April 21, 2025
Setting two LLMs on different GPUs in one offline inference script General	1	87	June 6, 2025
Why is this not working? I corrected it but still NVIDIA GPU Support	1	354	May 8, 2025
Can vLLM built for old GPU (GT 630M) ? It may use CUDA 9.1.85 Hardware Support	1	17	August 4, 2025
Can anyone help me? Why is this not working? It used 😭 NVIDIA GPU Support	1	395	May 8, 2025

Can’t use ampere features

Related topics