Do RTX 5090 and RTX PRO 5000 have some differencies which should be taken into account

To debug this issue, first ensure both systems have identical NVIDIA driver and CUDA versions (which you confirmed: 580.126.09, CUDA 13.0). Next, compare the output of nvidia-smi for both cards, focusing on the “CUDA Version,” “Driver Version,” and “Persistence-M” settings. Persistence mode differences are generally not the cause of kernel launch failures, but you can set it to “On” for both with nvidia-smi -pm 1 for consistency (NVIDIA docs).

For step-by-step debugging:

  1. Set CUDA_LAUNCH_BLOCKING=1 in your environment before running vLLM to get accurate error traces.
  2. Run a minimal CUDA program (e.g., python -c "import torch; print(torch.cuda.is_available())") on both systems to confirm CUDA is working.
  3. Check that vLLM and all dependencies (PyTorch, CUDA libraries) are built for the correct compute capability (12.0 for Blackwell/RTX 5090/PRO 5000).
  4. If the error persists only on the RTX PRO 5000, try running with a different model or with --enforce-eager to disable CUDA graphs, as some kernels may not be compatible.
  5. Review the vLLM troubleshooting guide for additional environment variables and debugging tips.

Would you like a sample script or more details on any of these steps?

Sources:

Hint: Mention @RunLLM in the post for followups.