NCCL P2P communication is supported on RTX 5090, but there is a known NCCL bug affecting multi-GPU (tensor parallelism >1) setups with 50xx/60xx series cards, including the 5090, in vLLM and other inference engines. The issue is caused by older NCCL versions (e.g., nvidia-nccl-cu12==2.25.1 or 2.26.2) bundled in the official vLLM Docker images, which do not include the upstream patch that fixes P2P for these GPUs. Forcing NCCL_P2P_DISABLE=1 does not resolve the problem in most cases. The fix is to upgrade NCCL to at least 2.26.5, 2.27.3, or newer (e.g., 2.27.6 or 2.27.7) inside your Docker/container environment. This has been confirmed to resolve the issue for 5090 and Blackwell cards in vLLM with tensor parallelism >1, and is also required for some 6000 series cards (source, source, source, source, source).
The official vLLM Docker images may not yet include the fixed NCCL version by default, so you must manually update it. For example, add RUN pip install nvidia-nccl-cu12==2.27.7 to your Dockerfile after the base image is pulled. This is a widely reported and resolved issue, and upgrading NCCL is the recommended solution (source, source, source, source, source). Would you like step-by-step instructions for updating your Docker image?
Can you give me step by step instructions how to upgrade the nccl to 2.26.5 ?
I have pulled the latest vLLM docker. and I am currently in the VM, not in the docker. So what should I do next?
You cannot change the --tensor-parallel-size argument of a running container. You must stop and remove the current container, then start a new one with the desired argument. For example, stop and remove the old container:
But then I have the old nccl again in it which I just updated to a running container.
So how can I update the nccl in the vllm docker and then run it with tensor paralle = 2 ?
Give me step by step instructions, where the goal is to update the current vLLM docker container to work with multiple 5090 (update nccl package) and then being able to run it with the update package
Take the docker launch command above, feed that to ChatGPT, Gemini, whatever you use and then also comment about NCCL upgrade, and then ask it to create a Docker-Compose.yml file for you. Attempting to do that NCCL upgrade via run command is a pain.
If you don’t want to do that, then everytime you launch the container with the updated TP command, you need to Exec into it and execute the NCCL upgrade command.
You’re correct: upgrading NCCL via docker exec is only temporary—each new container will revert to the original image. The best practice is to create a custom Docker image with NCCL 2.26.5 pre-installed, then use that image in your docker-compose.yml. This way, every time you launch the container (with any --tensor-parallel-size), the correct NCCL version is present automatically.
Step 1: Create a Dockerfile to upgrade NCCL
FROM vllm/vllm-openai:latest
RUN pip uninstall -y nvidia-nccl-cu12 && pip install nvidia-nccl-cu12==2.26.5
Now, launching with docker-compose up will always use the correct NCCL version and your desired tensor parallelism. For more, see official Docker deployment docs.
Would you like more details or help customizing this for your setup?