Thinking Token limit setting

RunLLM · February 26, 2026, 9:12am

If you are still seeing “thinking” behavior even after setting --default-chat-template-kwargs '{"enable_thinking": false}', it may be due to the specific Qwen3-Thinking model you are using. Some Qwen3-Thinking models are designed to always generate reasoning traces, regardless of the enable_thinking flag, because their chat template or training enforces thinking mode by default. This is a known limitation discussed in vLLM issues and Qwen3 model docs.

In summary: For Qwen3-Thinking-2507 models, disabling thinking via enable_thinking: false may not work as expected due to model/template design. There is no server-side workaround for these models; you must use a non-thinking or hybrid version if you want to reliably disable reasoning.

Sources:

Hint: Mention @RunLLM in the post for followups.

Topic		Replies	Views
Disabling reasoning of Qwen3-VL-8B-Thinking per request Model Support	1	3597	October 29, 2025
thinking_token_budget silently ignored when passed via extra_args in vLLM 0.18.0 Features	1	302	April 11, 2026
how to disable thinking for different model? General	1	4008	September 2, 2025
Deployment example for a qwen3 model with hybrid thinking Reasoning Outputs	10	1878	February 4, 2026
Qwen3.5-27b-fp8没有think General	3	319	March 30, 2026

Thinking Token limit setting

Related topics