how to disable thinking for different model?

want to call model without thinking using v1/chat/completions interface:
For glm4.5 model, the body is:
{
“model”: “glm-4.5”,
“messages”: message,
“thinking”: {
“type”: “disabled”
},
“temperature”: 0.6,
“tools”: tools
}
But for qwen3-8b model, the body is:
payload = {
“model”: “qwen3-8b”,
“messages”: message,
“enable_thinking”: False,
“temperature”: 0.6,
“tools”: tools
}
I found the body is different(glm4.5 using thinking key, qwen3 using enable_thinking key) for different. Is there no way to unify?

No, there is currently no unified key for disabling “thinking” across all models in vLLM. For GLM4.5, you must use the thinking key, while for Qwen3 you must use enable_thinking in the request body. This difference is due to model-specific implementations and is explicitly documented for each model; there is no standard parameter that works for all reasoning models in vLLM as of now. See GLM4.5 reasoning parser docs and Qwen3 reasoning outputs.

Would you like more detail on how to handle this in code or about the reasoning behind this design?

Sources: