RuntimeError: CUDA driver error: invalid device ordinal after the update to v0.11.0

hoffffen · October 6, 2025, 8:08pm

I run in Kubernetes with the following setup vllm serve openai/gpt-oss-120b --trust-remote-code --tool-call-parser openai
–enable-auto-tool-choice --max-model-len 60000 --gpu-memory-utilization 0.95
–tensor-parallel-size=2. Everything works when I set –tensor-parallel-size=1 or rollback to v0.10.2

RunLLM · October 6, 2025, 8:08pm

Your setup is correct for running vLLM with tensor parallelism, but issues with --tensor-parallel-size >1 are often due to Kubernetes resource allocation, NCCL, or container configuration. Ensure that your pod requests and limits specify at least 2 GPUs, and that /dev/shm is mounted with sufficient size and --ipc=host is set. Also, check that the NCCL version is compatible and that the GPUs are visible inside the container. These are common requirements for multi-GPU vLLM deployments in Kubernetes, as described in the vLLM docs and user reports. If you still encounter issues, try using pipeline parallelism as a workaround for uneven GPU splits or hardware without NVLINK, or check for NCCL-related errors in the logs for further debugging steps. See the official docs for more details on distributed serving and parallelism setup.

If you need step-by-step troubleshooting or YAML examples, let me know.

Sources:

Hint: Mention @RunLLM in the post for followups.

hoffffen · October 6, 2025, 8:09pm

@RunLLM But why it works on v0.10.2 then?

RunLLM · October 6, 2025, 8:10pm

If it works on v0.10.2 but not newer versions, it’s likely due to changes in how vLLM handles tensor parallelism, NCCL, or distributed initialization. Recent updates may have introduced stricter checks, new defaults, or compatibility issues with Kubernetes GPU allocation or NCCL setup, which were more permissive or handled differently in v0.10.2. This pattern is seen in user reports where multi-GPU setups break after upgrading, often due to changes in distributed backend or environment variable handling (see example).

Would you like a step-by-step comparison of environment variables, pod specs, or logs to pinpoint the exact change?

Sources:

Hint: Mention @RunLLM in the post for followups.

TarunAitha · October 13, 2025, 8:45am

I too observed same thing. Can we expect fix in the next VLLM release ?

camole · October 27, 2025, 8:54pm

I have the same issue here. Switching back to 0.10.2 did fix the the problem as OP mentioned. The issue happens with both the 0.11.0 container as well as the 0.11.0 pip installable package. Please fix the issue

Topic		Replies	Views
Help for error when run vllm with tensor parallel General	7	948	May 21, 2025
Added second 5090 and turne on tensor parallel 2 General	9	98	September 18, 2025
vLLM does not work with 2x 5090 in tp 2 General	8	197	September 18, 2025
Can anyone help me? Why is this not working? It used 😭 NVIDIA GPU Support	1	664	May 8, 2025
I got "NCCL error" when launch LLMEngine with data parallel = 2 General	1	189	July 31, 2025

RuntimeError: CUDA driver error: invalid device ordinal after the update to v0.11.0

Related topics