If you are still seeing “thinking” behavior even after setting --default-chat-template-kwargs '{"enable_thinking": false}', it may be due to the specific Qwen3-Thinking model you are using. Some Qwen3-Thinking models are designed to always generate reasoning traces, regardless of the enable_thinking flag, because their chat template or training enforces thinking mode by default. This is a known limitation discussed in vLLM issues and Qwen3 model docs.
In summary: For Qwen3-Thinking-2507 models, disabling thinking via enable_thinking: false may not work as expected due to model/template design. There is no server-side workaround for these models; you must use a non-thinking or hybrid version if you want to reliably disable reasoning.
Sources:
- https://github.com/vllm-project/vllm/issues/27118
- https://docs.vllm.ai/en/latest/features/reasoning_outputs/
Hint: Mention @RunLLM in the post for followups.