@RunLLM But making “enable_thinking”: false is not avoiding thinking in the model it is just not streaming the thinking tokens as output
Ranjith
7
Related topics
| Topic | Replies | Views | Activity | |
|---|---|---|---|---|
| Disabling reasoning of Qwen3-VL-8B-Thinking per request | 1 | 3713 | October 29, 2025 | |
| Trying to run Qwen3.5-397B-A17B-GPTQ-Int4 | 10 | 568 | March 13, 2026 | |
| Qwen3.5-27b-fp8没有think | 3 | 367 | March 30, 2026 | |
| RTX PRO6000运行Qwen3-32B-FP8量化版本出错 | 1 | 510 | July 4, 2025 | |
| RuntimeError: Int8 not supported on SM120. Use FP8 quantization instead, or run on older arch (SM < 100) | 1 | 230 | November 19, 2025 |