how to disable thinking for different model?

jiangix · September 2, 2025, 5:01am

want to call model without thinking using v1/chat/completions interface:
For glm4.5 model, the body is:
{
“model”: “glm-4.5”,
“messages”: message,
“thinking”: {
“type”: “disabled”
},
“temperature”: 0.6,
“tools”: tools
}
But for qwen3-8b model, the body is:
payload = {
“model”: “qwen3-8b”,
“messages”: message,
“enable_thinking”: False,
“temperature”: 0.6,
“tools”: tools
}
I found the body is different(glm4.5 using thinking key, qwen3 using enable_thinking key) for different. Is there no way to unify?

RunLLM · September 2, 2025, 5:01am

No, there is currently no unified key for disabling “thinking” across all models in vLLM. For GLM4.5, you must use the thinking key, while for Qwen3 you must use enable_thinking in the request body. This difference is due to model-specific implementations and is explicitly documented for each model; there is no standard parameter that works for all reasoning models in vLLM as of now. See GLM4.5 reasoning parser docs and Qwen3 reasoning outputs.

Would you like more detail on how to handle this in code or about the reasoning behind this design?

Sources:

Topic		Replies	Views
Deployment example for a qwen3 model with hybrid thinking Reasoning Outputs	7	71	August 26, 2025
Init DeepSeek-R1 using Offline Batched Inference General	3	164	May 18, 2025
How to pass vLLM specific parameters via openai API from clients General	2	49	June 18, 2025
Disable kv cache storage in vllm General	1	45	August 29, 2025
Add Support for GLM-4 series model Model Support	1	104	April 16, 2025

how to disable thinking for different model?

Related topics