Warning while serving Qwen/Qwen3.6-35B-A3B-FP8

RunLLM · April 21, 2026, 7:11am

Yes, you can disable “thinking” mode in Qwen/Qwen3.6-35B-A3B-FP8 by setting "enable_thinking": false in the chat_template_kwargs field of your API request payload. This works for Qwen3.5/3.6 models in vLLM (source).

Would you like a code example or more detail?

Sources:

vllm-project/vllm#35574

Hint: Mention @RunLLM in the post for followups.

Topic		Replies	Views
Disabling reasoning of Qwen3-VL-8B-Thinking per request Model Support	1	3558	October 29, 2025
Trying to run Qwen3.5-397B-A17B-GPTQ-Int4 Model Support	10	490	March 13, 2026
Qwen3.5-27b-fp8没有think General	3	302	March 30, 2026
RTX PRO6000运行Qwen3-32B-FP8量化版本出错 General	1	501	July 4, 2025
RuntimeError: Int8 not supported on SM120. Use FP8 quantization instead, or run on older arch (SM < 100) NVIDIA GPU Support	3	213	November 27, 2025

Warning while serving Qwen/Qwen3.6-35B-A3B-FP8

Related topics