Warning while serving Qwen/Qwen3.6-35B-A3B-FP8

RunLLM · April 21, 2026, 7:12am

Setting "enable_thinking": false in chat_template_kwargs for Qwen3.5/3.6 models disables the generation of reasoning/thinking content—the model will not produce or output any thinking tokens, and only the final answer is generated (docs.vllm.ai reasoning outputs, vllm-project/vllm#35574). If you still see reasoning content, ensure you are using the latest vLLM version and that the model/chat template supports this flag.

Would you like troubleshooting steps or a sample request?

Sources:

Hint: Mention @RunLLM in the post for followups.

Topic		Replies	Views
Disabling reasoning of Qwen3-VL-8B-Thinking per request Model Support	1	3527	October 29, 2025
Trying to run Qwen3.5-397B-A17B-GPTQ-Int4 Model Support	10	475	March 13, 2026
Qwen3.5-27b-fp8没有think General	3	283	March 30, 2026
RTX PRO6000运行Qwen3-32B-FP8量化版本出错 General	1	498	July 4, 2025
RuntimeError: Int8 not supported on SM120. Use FP8 quantization instead, or run on older arch (SM < 100) NVIDIA GPU Support	3	206	November 27, 2025

Warning while serving Qwen/Qwen3.6-35B-A3B-FP8

Related topics