RuntimeError: CUDA driver error: invalid device ordinal after the update to v0.11.0

I run in Kubernetes with the following setup vllm serve openai/gpt-oss-120b --trust-remote-code --tool-call-parser openai
–enable-auto-tool-choice --max-model-len 60000 --gpu-memory-utilization 0.95
–tensor-parallel-size=2. Everything works when I set –tensor-parallel-size=1 or rollback to v0.10.2

2 Likes

Your setup is correct for running vLLM with tensor parallelism, but issues with --tensor-parallel-size >1 are often due to Kubernetes resource allocation, NCCL, or container configuration. Ensure that your pod requests and limits specify at least 2 GPUs, and that /dev/shm is mounted with sufficient size and --ipc=host is set. Also, check that the NCCL version is compatible and that the GPUs are visible inside the container. These are common requirements for multi-GPU vLLM deployments in Kubernetes, as described in the vLLM docs and user reports. If you still encounter issues, try using pipeline parallelism as a workaround for uneven GPU splits or hardware without NVLINK, or check for NCCL-related errors in the logs for further debugging steps. See the official docs for more details on distributed serving and parallelism setup.

If you need step-by-step troubleshooting or YAML examples, let me know.

Sources:

Hint: Mention @RunLLM in the post for followups.

@RunLLM But why it works on v0.10.2 then?

If it works on v0.10.2 but not newer versions, it’s likely due to changes in how vLLM handles tensor parallelism, NCCL, or distributed initialization. Recent updates may have introduced stricter checks, new defaults, or compatibility issues with Kubernetes GPU allocation or NCCL setup, which were more permissive or handled differently in v0.10.2. This pattern is seen in user reports where multi-GPU setups break after upgrading, often due to changes in distributed backend or environment variable handling (see example).

Would you like a step-by-step comparison of environment variables, pod specs, or logs to pinpoint the exact change?

Sources:

Hint: Mention @RunLLM in the post for followups.

I too observed same thing. Can we expect fix in the next VLLM release ?

1 Like

I have the same issue here. Switching back to 0.10.2 did fix the the problem as OP mentioned. The issue happens with both the 0.11.0 container as well as the 0.11.0 pip installable package. Please fix the issue