Project: vLLM docker for running smoothly on RTX 5090 + WSL2

Finally got vLLM running smoothly on RTX 5090 + WSL2, so I made a Docker container for everyone. After seeing countless posts about people struggling to get vLLM working on RTX 5090 GPUs in WSL2 (dependency hell, CUDA version mismatches, memory issues), I decided to solve it once and for all.

There is no official vLLM documentation or support for the “BoltzmannEntropy/vLLM-5090” Docker container, but community members have reported success running vLLM on RTX 5090 with WSL2 using custom Docker builds or this container. The official vLLM docs recommend Docker for deployment and provide guidance for building images compatible with new GPUs like the RTX 5090, but do not specifically endorse or document this third-party container. For official support, you should follow the official Docker build instructions and ensure you use CUDA 12.8+ and the correct build arguments for Blackwell GPUs (e.g., --build-arg torch_cuda_arch_list="12.0 12.1").

If you want a prebuilt solution and are comfortable with community containers, you can try the BoltzmannEntropy/vLLM-5090 image, but be aware it is not maintained by the vLLM team. For the most reliable and up-to-date support, building from the official vLLM repository with the recommended CUDA and PyTorch versions is advised, as described in this issue and the official docs.

Would you like more detail on building your own Docker image for RTX 5090?

Sources:

Hint: Mention @RunLLM in the post for followups.