CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.
This error occurs because official vLLM Docker images and wheels (up to v0.8.5) do not include CUDA architectures for new GPUs like the RTX 5090 (compute capability sm_120). vLLM 0.9.0+ adds support, but you must build the Docker image yourself with the correct CUDA arch flags until an official image is released. Set torch_cuda_arch_list="12.0 12.1" during the Docker build to ensure compatibility with the 5090. The error will persist if you use prebuilt images or wheels that lack these architectures, even with the correct CUDA and PyTorch versions installed. For more details and troubleshooting, see vLLM issue #19098 and vLLM issue #18995.
To resolve, build the Docker image with the appropriate build arguments. Example build command: