Help for error when run vllm with tensor parallel

RunLLM · May 21, 2025, 8:26am

When vLLM “stuck” after printing lines like Using FlashAttention backend. and vLLM is using nccl==2.18.1, it usually means the process is waiting for all tensor parallel workers to initialize and synchronize, but something is blocking the distributed setup. This is a common issue with multi-GPU/tensor-parallel vLLM setups.

Checklist to resolve:

Sufficient Shared Memory:
Ensure your Docker container has enough shared memory (/dev/shm). For multi-GPU, this is critical. Start your container with --shm-size=32g (or higher, e.g. --shm-size=64g for large models)
See vllm/issues/1437.
No CUDA Initialization Before vLLM:
You already restarted Docker, which is good. Make sure no other process or script in the container touches CUDA before vLLM starts.
Correct NCCL and CUDA Environment:
- Set NCCL_P2P_DISABLE=1 as an environment variable before starting vLLM. This can help if your GPUs are not fully NVLink-connected or if you see NCCL hangs.
- Set CUDA_VISIBLE_DEVICES=0,1 (or whichever GPUs you want to use) before running vLLM.
- If running in Docker, use --gpus all and the NVIDIA runtime.
Ray and vLLM Version Compatibility:
Some Ray/vLLM version mismatches can cause hangs. Use recent, matching versions.
Wait for Model Loading:
For large models, loading and initializing can take several minutes, especially on first run or with slow disks. Monitor nvidia-smi to see if GPU memory usage increases.
Check for Deadlocks or Errors:
If it hangs for more than 10-15 minutes, check Ray logs (/tmp/ray/session_latest/logs/) for errors, or run with debug logging:
```
export VLLM_LOGGING_LEVEL=DEBUG
export NCCL_DEBUG=TRACE
python /root/vllm/vllm/entrypoints/openai/api_server.py ...
```
See vLLM troubleshooting.

Example Docker run:

docker run --gpus all --shm-size=32g --ipc=host -it <your_image>

Would you like more detailed step-by-step debugging instructions?

Sources:

Topic		Replies	Views
Added second 5090 and turne on tensor parallel 2 General	9	455	September 18, 2025
NCCL error across 2 machines 2x4GPUs need advice General	2	479	November 12, 2025
vLLM Tensor Parallel Workers Not Completing Initialization General	5	1474	May 4, 2026
vLLM does not work with 2x 5090 in tp 2 General	8	874	September 18, 2025
RuntimeError: CUDA driver error: invalid device ordinal after the update to v0.11.0 General	5	581	October 27, 2025

Help for error when run vllm with tensor parallel

Related topics