GLM 4.7-FP8 Reasoning Start Issues?

Yes, this is a known issue: GLM-4.7 (and some other GLM-4.x models) sometimes omit the initial <think> tag in their outputs, which causes the current glm45 reasoning parser in vLLM to return reasoning: null even though the model is reasoning internally. The parser expects reasoning content to be enclosed within <think>...</think>, but if the model only outputs the closing tag or omits both, the reasoning field will be empty. This behavior has been observed and discussed for other GLM and Qwen models as well, and is often related to the chat template or model output format, not a vLLM bug per se. See the discussion in GLM-4.5 reasoning parser streaming fails without tools in request and Missing Opening for Qwen32B.

To work around this, you can try modifying the chat template to ensure the model outputs the <think> tag at the start of its reasoning, or prompt the model explicitly to use the <think>...</think> format. This is a model/template issue, not a vLLM configuration problem. For more details on reasoning model support and parser expectations, see Reasoning Outputs documentation.

Would you like more detail on how to modify the chat template or prompt to encourage <think> output?

Sources:

Hint: Mention @RunLLM in the post for followups.