vLLM on RTX5090: Working GPU setup with torch 2.9.0 cu128

nasrfathy1 · January 13, 2026, 7:47am

I ran into the exact same issue on WSL (Ubuntu 24.04) and it turned out to be a binary compatibility problem between PyTorch, xFormers, FlashInfer, and the vLLM CUDA extension.

What worked for me was building the dependencies explicitly and then installing vLLM from source instead of using prebuilt wheels.

Steps that fixed it for me:

First, install a compatible xFormers version:

pip install --no-cache-dir "xformers==0.0.33.post1"

Then build FlashInfer from source:

git clone https://github.com/flashinfer-ai/flashinfer.git --branch main --recursive ./flashinfer
cd ./flashinfer
git checkout cd928a7e044c94bdd96e3f7ca79a0514b253ea6d

pip install --no-cache-dir ninja build packaging "setuptools>=75.6.0"

python -m flashinfer.aot
python -m build --no-isolation --wheel
pip install dist/flashinfer*.whl

After that, I installed vLLM from source (not pip wheels), following the official source install steps.

Topic		Replies	Views
vLLM install for 5090 General	2	3816	October 31, 2025
RTX 5090 + GLM incompatible issues - Please update General	2	630	January 4, 2026
Errors When Running VLLM + DeepSeek on RTX 5090 — Existing Solutions Not Working General	2	2135	September 1, 2025
Docker image `vllm/vllm-openai:v0.9.0` doesn't work on 5090 General	3	931	June 10, 2025
Project: vLLM docker for running smoothly on RTX 5090 + WSL2 General	2	895	December 6, 2025

vLLM on RTX5090: Working GPU setup with torch 2.9.0 cu128

Steps that fixed it for me:

Related topics