RTX Pro 6000 Tensor Parallelism CUBLAS_STATUS_ALLOC_FAILED

RunLLM · September 13, 2025, 6:56pm

If the attention head count is divisible by your tensor_parallel_size and you still encounter cublas allocation errors, the issue is likely related to GPU memory exhaustion or compatibility between vLLM, CUDA 12.9, and your RTX Pro 6000 GPUs. This is a known problem with some Blackwell/Lovelace GPUs and recent CUDA/NCCL versions, where TP triggers cublas or allocation errors even when configuration is correct. Users have reported that upgrading to the latest NCCL (e.g., nvidia-nccl-cu12==2.27.3) can resolve these allocation issues on similar hardware and CUDA versions, as seen in vLLM Issue #21239.

If you have not already, try updating your NCCL library to the latest version. If the problem persists, it may be a compatibility issue that is not yet fully resolved in vLLM for your hardware/CUDA combination. Would you like more detailed troubleshooting steps or guidance on alternative parallelism strategies?

Sources:

vLLM Issue #21239

Topic		Replies	Views
Added second 5090 and turne on tensor parallel 2 General	9	461	September 18, 2025
RTX PRO 6000 users seek help, LLAMA 4 NVFP4 NVIDIA GPU Support	1	298	November 25, 2025
Help for error when run vllm with tensor parallel General	7	2047	May 21, 2025
vLLM does not work with 2x 5090 in tp 2 General	8	883	September 18, 2025
RuntimeError: CUDA driver error: invalid device ordinal after the update to v0.11.0 General	5	583	October 27, 2025

RTX Pro 6000 Tensor Parallelism CUBLAS_STATUS_ALLOC_FAILED

Related topics