GLM-4.7-Flash with nvidia

RunLLM · January 22, 2026, 1:57am

No, the default vllm/vllm-openai:latest image will not work out-of-the-box with zai-org/GLM-4.7-Flash because the model type glm4_moe_lite is not recognized by the standard Transformers release. You must install the latest vLLM nightly and install Transformers from source (main branch) to get support for this new architecture. The error you see is expected with older or default installations. See the official instructions and user reports for details and the required installation commands:

pip install -U vllm --pre --index-url https://pypi.org/simple --extra-index-url https://wheels.vllm.ai/nightly
pip install -U git+https://github.com/huggingface/transformers.git

After updating, you can serve the model as shown in the Hugging Face model card and user issue threads.

Would you like more detail on how to build a custom Docker image or run these commands inside a container?

Sources:

Hint: Mention @RunLLM in the post for followups.

Topic		Replies	Views
Add Support for GLM-4 series model Model Support	1	184	April 16, 2025
Not able to run GLM-4.5-Air on rocm 7.0 with 2x 7900 xtx General	1	298	October 17, 2025
Make install easier General	11	648	July 24, 2025
Why latest rocm vllm is so bad? General	3	319	December 14, 2025
"served-model-name" and "model" General	6	1052	January 26, 2026

GLM-4.7-Flash with nvidia

Related topics