RTX 5090 + GLM incompatible issues - Please update

jxs714 · January 4, 2026, 6:21am

Hey guys,

First, I’d like to say thank you so much for this excellent LLM software and for the hard work. I currently have an issue. I have an RTX 5090 with sm120 blackwell, and have been trying to set this up for the past few days with GLM 4.6 and 4.7 flash. I tried building a custom vLLM and it just doesnt work due to incompatibility. I would really like to use vLLM over llama.cpp and ollama as I notice the performance on vLLM is out of this world. It is the fastest I’ve ever seen Local AI (even the 3rd party AI platforms) go as far as performance.

Will there be an update to this soon?

RunLLM · January 4, 2026, 6:21am

Official support for RTX 5090 (Blackwell, sm120) has recently landed in vLLM, but pre-built wheels and Docker images may still lack full compatibility due to CUDA architecture flags and PyPI wheel size limits. The recommended solution is to build vLLM from source with CUDA 12.8+ and the correct torch_cuda_arch_list (including “12.0 12.1”) for Blackwell GPUs. This approach has been confirmed to work by several users, and official Docker images with SM120 support are expected in upcoming releases. See detailed build steps and discussion in this issue and this issue.

For GLM 4.6/4.7 and FlashInfer, ensure you use the latest PyTorch nightly (2.9+), vLLM 0.9.2 or newer, and set the appropriate environment variables for FlashInfer and MoE FP4/FP8 support. Example Docker build/run commands and troubleshooting tips are provided in the linked issues. Would you like a step-by-step guide for building and running vLLM on RTX 5090 with GLM 4.6/4.7?

Sources:

Hint: Mention @RunLLM in the post for followups.

jxs714 · January 4, 2026, 11:59pm

Thank you for the response and information! Looking into this.

Topic		Replies	Views
vLLM on RTX5090: Working GPU setup with torch 2.9.0 cu128 NVIDIA GPU Support	18	6491	January 13, 2026
vLLM install for 5090 General	2	3825	October 31, 2025
Docker image `vllm/vllm-openai:v0.9.0` doesn't work on 5090 General	3	940	June 10, 2025
Project: vLLM docker for running smoothly on RTX 5090 + WSL2 General	2	900	December 6, 2025
Do RTX 5090 and RTX PRO 5000 have some differencies which should be taken into account General	7	182	March 19, 2026

RTX 5090 + GLM incompatible issues - Please update

Related topics