No, the default vllm/vllm-openai:latest image will not work out-of-the-box with zai-org/GLM-4.7-Flash because the model type glm4_moe_lite is not recognized by the standard Transformers release. You must install the latest vLLM nightly and install Transformers from source (main branch) to get support for this new architecture. The error you see is expected with older or default installations. See the official instructions and user reports for details and the required installation commands:
pip install -U vllm --pre --index-url https://pypi.org/simple --extra-index-url https://wheels.vllm.ai/nightly
pip install -U git+https://github.com/huggingface/transformers.git
After updating, you can serve the model as shown in the Hugging Face model card and user issue threads.
Would you like more detail on how to build a custom Docker image or run these commands inside a container?
Sources:
Hint: Mention @RunLLM in the post for followups.