Docker image `vllm/vllm-openai:v0.9.0` doesn't work on 5090

Hi forum,
As I followed the steps in [Doc]: Steps to run vLLM on your RTX5080 or 5090! · Issue #14452 · vllm-project/vllm · GitHub, I could built vllm v0.9.0 docker image and run it on 5090, but DockerHub–vLLM v0.9.0 doesn’t with error message–RuntimeError: CUDA error: no kernel image is available for execution on the device on inference time (api call to /v1/chat/completions).
Does anybody have same issue with mine? or should I test with another way? Here’s the way I’ve used:

  • docker run command

    docker run -d --gpus "device=0" -p "8000:8000" -v ./model/:/model/:ro -v ./entrypoint.sh:/workspace/entrypoint.sh:ro --entrypoint /workspace/entrypoint.sh vllm/vllm-openai:v0.9.0
    
  • entrypoint.sh

    #!/bin/bash
    python3 -m vllm.entrypoints.openai.api_server \
      --model /model \
      --host 0.0.0.0 --port 8000 \
      --trust-remote-code
    

note: model I used is llama-based one. it works well on 4090, 6000 ada

Yes, this is a known issue: the official vLLM DockerHub image v0.9.0 does not support RTX 5090 (Blackwell, compute capability sm_120) out of the box, resulting in RuntimeError: CUDA error: no kernel image is available for execution on the device. This is because the prebuilt image lacks CUDA kernels for the new architecture. You must build the Docker image yourself with the correct CUDA arch flags (e.g., --build-arg torch_cuda_arch_list=“12.0 12.1”) to ensure compatibility with the 5090, as described in vLLM issue #14452, vLLM issue #16901, and vLLM issue #19098.

Until an official image with Blackwell support is released, continue using your custom-built image. The official image works on 4090/6000 Ada because those architectures are included in the prebuilt kernels, but not for 5090. For more details and troubleshooting, see the linked issues above.

Would you like step-by-step build instructions or troubleshooting tips?

Sources:

1 Like

Thanks @RunLLM,
Could you tell me if it still happens with v0.9.0.1?

Yes, the issue still occurs with vLLM DockerHub image v0.9.0.1—official images do not include CUDA kernels for RTX 5090 (sm_120), so you will get the same RuntimeError: CUDA error: no kernel image is available for execution on the device on inference. Multiple users have confirmed this in vLLM issue #13306 and vLLM issue #16901.

You must continue to build the Docker image yourself with --build-arg torch_cuda_arch_list=“12.0 12.1” for 5090 support until the official images are updated to include these architectures.

Would you like detailed build steps or troubleshooting for custom images?

Sources: