Deployment example for a qwen3 model with hybrid thinking

Could you please show me a deployment example for a qwen3 model with hybrid thinking, for example, Qwen3-30B-A3B, and example of disabling thinking

To deploy Qwen3-30B-A3B with hybrid thinking (reasoning enabled), use the following command, specifying the reasoning parser as qwen3:

vllm serve Qwen/Qwen3-30B-A3B --enable-reasoning --reasoning-parser qwen3

To disable thinking (reasoning) for a specific request, pass chat_template_kwargs={“enable_thinking”: False} in your API call or Python client:

from openai import OpenAI

client = OpenAI(api_key="EMPTY", base_url="http://localhost:8000/v1")
messages = [{"role": "user", "content": "Give me a short introduction to large language models."}]
response = client.chat.completions.create(
    model="Qwen/Qwen3-30B-A3B",
    messages=messages,
    extra_body={"chat_template_kwargs": {"enable_thinking": False}}
)
print(response.choices[0].message.content)

According to the docs, the Qwen3 reasoning feature is enabled by default and can be disabled per request as shown above. For more details, see the official reasoning outputs documentation and Qwen3 reasoning parser API.

Would you like more detail on hybrid thinking or deployment options?

Sources:

--enable-reasoning is outdated.

Correct, the --enable-reasoning flag is deprecated and has been removed as of vLLM v0.10.0. You should use only --reasoning-parser (e.g., --reasoning-parser qwen3) to enable reasoning mode for supported models. When --reasoning-parser is specified, reasoning is automatically enabled—no need for --enable-reasoning anymore. See engine args documentation and deprecation notice.

Would you like more detail on migration or usage?

Sources:

Do I still need to set chat_template_kwargs to disable thinking?

Yes, to disable thinking (reasoning) for Qwen3 and similar models, you still need to set chat_template_kwargs={“enable_thinking”: False} in your API request. This is the recommended way to turn off reasoning on a per-request basis, even when using the latest vLLM and reasoning parser. See reasoning outputs documentation and Qwen3 usage guide.

Would you like more detail or code examples?

Sources:

I set it, but still got reasoning process?

If you set chat_template_kwargs={“enable_thinking”: False} but still receive reasoning content, ensure you are using the correct reasoning parser (qwen3 for Qwen3 models) and the latest vLLM version. Some older versions or incorrect parser settings may ignore the disable flag, causing reasoning content to appear regardless. This issue was fixed in recent vLLM releases—updating to v0.9.1+ and using --reasoning-parser qwen3 should resolve it (reasoning outputs doc, bugfix PR).

Would you like troubleshooting steps or a version check guide?

Sources: