To debug this issue, first ensure both systems have identical NVIDIA driver and CUDA versions (which you confirmed: 580.126.09, CUDA 13.0). Next, compare the output of nvidia-smi for both cards, focusing on the “CUDA Version,” “Driver Version,” and “Persistence-M” settings. Persistence mode differences are generally not the cause of kernel launch failures, but you can set it to “On” for both with nvidia-smi -pm 1 for consistency (NVIDIA docs).
For step-by-step debugging:
- Set
CUDA_LAUNCH_BLOCKING=1in your environment before running vLLM to get accurate error traces. - Run a minimal CUDA program (e.g.,
python -c "import torch; print(torch.cuda.is_available())") on both systems to confirm CUDA is working. - Check that vLLM and all dependencies (PyTorch, CUDA libraries) are built for the correct compute capability (12.0 for Blackwell/RTX 5090/PRO 5000).
- If the error persists only on the RTX PRO 5000, try running with a different model or with
--enforce-eagerto disable CUDA graphs, as some kernels may not be compatible. - Review the vLLM troubleshooting guide for additional environment variables and debugging tips.
Would you like a sample script or more details on any of these steps?
Sources:
Hint: Mention @RunLLM in the post for followups.