Thinking Token limit setting

If you are still seeing “thinking” behavior even after setting --default-chat-template-kwargs '{"enable_thinking": false}', it may be due to the specific Qwen3-Thinking model you are using. Some Qwen3-Thinking models are designed to always generate reasoning traces, regardless of the enable_thinking flag, because their chat template or training enforces thinking mode by default. This is a known limitation discussed in vLLM issues and Qwen3 model docs.

In summary: For Qwen3-Thinking-2507 models, disabling thinking via enable_thinking: false may not work as expected due to model/template design. There is no server-side workaround for these models; you must use a non-thinking or hybrid version if you want to reliably disable reasoning.

Sources:

Hint: Mention @RunLLM in the post for followups.