vLLM does not work with 2x 5090 in tp 2

Hi,
I cant get 2x 5090 work with vLLM (latest docker) in tp = 2.
I can see that there is some patch:

But its not applied yet? Why vLLM latest docker does not have this patch? This prevents to use vLLM with multipe 50 series cards and also 6000 series?

NCCL P2P communication is supported on RTX 5090, but there is a known NCCL bug affecting multi-GPU (tensor parallelism >1) setups with 50xx/60xx series cards, including the 5090, in vLLM and other inference engines. The issue is caused by older NCCL versions (e.g., nvidia-nccl-cu12==2.25.1 or 2.26.2) bundled in the official vLLM Docker images, which do not include the upstream patch that fixes P2P for these GPUs. Forcing NCCL_P2P_DISABLE=1 does not resolve the problem in most cases. The fix is to upgrade NCCL to at least 2.26.5, 2.27.3, or newer (e.g., 2.27.6 or 2.27.7) inside your Docker/container environment. This has been confirmed to resolve the issue for 5090 and Blackwell cards in vLLM with tensor parallelism >1, and is also required for some 6000 series cards (source, source, source, source, source).

The official vLLM Docker images may not yet include the fixed NCCL version by default, so you must manually update it. For example, add RUN pip install nvidia-nccl-cu12==2.27.7 to your Dockerfile after the base image is pulled. This is a widely reported and resolved issue, and upgrading NCCL is the recommended solution (source, source, source, source, source). Would you like step-by-step instructions for updating your Docker image?

Sources:

Can you give me step by step instructions how to upgrade the nccl to 2.26.5 ?
I have pulled the latest vLLM docker. and I am currently in the VM, not in the docker. So what should I do next?

To upgrade NCCL to 2.26.5 in your vLLM Docker container, follow these steps:

  1. Start a shell in your running vLLM Docker container (replace vllm_container with your actual container name or ID):
docker exec -it vllm_container /bin/bash
  1. Uninstall the old NCCL version and install 2.26.5 inside the container:
pip uninstall -y nvidia-nccl-cu12
pip install nvidia-nccl-cu12==2.26.5
  1. Restart the vLLM server inside the container if it was running.

This will ensure vLLM uses NCCL 2.26.5, which fixes the multi-GPU issue on RTX 5090. For more details, see this discussion and this example Dockerfile snippet.

Would you like a Dockerfile example for future builds?

Sources:

I updated the package to a docker container which was run with a command:
–tensor-parallel-size 1

how can I now change that 1 to 2 ?
docker start wont to do it. I am new to docker.

You cannot change the --tensor-parallel-size argument of a running container. You must stop and remove the current container, then start a new one with the desired argument. For example, stop and remove the old container:

docker stop <container_name>
docker rm <container_name>

Then run a new container with --tensor-parallel-size 2:

docker run --runtime nvidia --gpus all \
  -v ~/.cache/huggingface:/root/.cache/huggingface \
  -p 8000:8000 \
  --ipc=host \
  vllm/vllm-openai:latest \
  --model <your_model> \
  --tensor-parallel-size 2

See official Docker usage docs for more details. Would you like more step-by-step guidance?

Sources:

But then I have the old nccl again in it which I just updated to a running container.
So how can I update the nccl in the vllm docker and then run it with tensor paralle = 2 ?

Give me step by step instructions, where the goal is to update the current vLLM docker container to work with multiple 5090 (update nccl package) and then being able to run it with the update package

Take the docker launch command above, feed that to ChatGPT, Gemini, whatever you use and then also comment about NCCL upgrade, and then ask it to create a Docker-Compose.yml file for you. Attempting to do that NCCL upgrade via run command is a pain.

If you don’t want to do that, then everytime you launch the container with the updated TP command, you need to Exec into it and execute the NCCL upgrade command.

You’re correct: upgrading NCCL via docker exec is only temporary—each new container will revert to the original image. The best practice is to create a custom Docker image with NCCL 2.26.5 pre-installed, then use that image in your docker-compose.yml. This way, every time you launch the container (with any --tensor-parallel-size), the correct NCCL version is present automatically.

Step 1: Create a Dockerfile to upgrade NCCL

FROM vllm/vllm-openai:latest
RUN pip uninstall -y nvidia-nccl-cu12 && pip install nvidia-nccl-cu12==2.26.5

Step 2: Build your custom image

docker build -t vllm-nccl-2.26.5 .

Step 3: Create a docker-compose.yml file

version: "3.9"
services:
  vllm:
    image: vllm-nccl-2.26.5
    runtime: nvidia
    environment:
      - NVIDIA_VISIBLE_DEVICES=all
    volumes:
      - ~/.cache/huggingface:/root/.cache/huggingface
    ports:
      - "8000:8000"
    ipc: host
    shm_size: 16g
    command: >
      --model <your_model>
      --tensor-parallel-size 2

Now, launching with docker-compose up will always use the correct NCCL version and your desired tensor parallelism. For more, see official Docker deployment docs.

Would you like more details or help customizing this for your setup?

Sources: