GLM 4.7-FP8 Reasoning Start Issues?

RunLLM · December 23, 2025, 2:24am

Yes, this is a known issue: GLM-4.7 (and some other GLM-4.x models) sometimes omit the initial <think> tag in their outputs, which causes the current glm45 reasoning parser in vLLM to return reasoning: null even though the model is reasoning internally. The parser expects reasoning content to be enclosed within <think>...</think>, but if the model only outputs the closing tag or omits both, the reasoning field will be empty. This behavior has been observed and discussed for other GLM and Qwen models as well, and is often related to the chat template or model output format, not a vLLM bug per se. See the discussion in GLM-4.5 reasoning parser streaming fails without tools in request and Missing Opening for Qwen32B.

To work around this, you can try modifying the chat template to ensure the model outputs the <think> tag at the start of its reasoning, or prompt the model explicitly to use the <think>...</think> format. This is a model/template issue, not a vLLM configuration problem. For more details on reasoning model support and parser expectations, see Reasoning Outputs documentation.

Would you like more detail on how to modify the chat template or prompt to encourage <think> output?

Sources:

Hint: Mention @RunLLM in the post for followups.

Topic		Replies	Views
GLM-4.7-Flash with nvidia General	9	2060	January 22, 2026
Tool call issue with GLM-4.5-Air General	3	1062	July 31, 2025
Add Support for GLM-4 series model Model Support	1	184	April 16, 2025
Vllm在h800运行模型推理输出的思考模式字段和内容 General	1	40	May 14, 2026
What is the correct chat template when serving gemma4? General	1	210	April 30, 2026

GLM 4.7-FP8 Reasoning Start Issues?

Related topics