I would like to enable or disable thinking per request but i do not know how to do that since whatever i try (/no_think) the model always reasons about the prompt.
Is it even possible to turn on or off the reasoning of Qwen3-VL Thinking models?
Yes, you can enable or disable “thinking” (reasoning) per request for Qwen3-VL-Thinking models in vLLM by passing the parameter enable_thinking in the request body. For Qwen3 series, set extra_body={"chat_template_kwargs": {"enable_thinking": False}} to disable, or True to enable, reasoning for that specific request. However, for Qwen3-VL-Thinking models, the default chat template always includes the <think> tag, so disabling thinking via this parameter may not work as expected, and the model may still output reasoning content regardless of the flag. This is a known limitation due to how the chat template is constructed for these models, and using the DeepSeek-R1 reasoning parser is sometimes recommended as a workaround, but it does not fully resolve the issue for all Qwen3-VL-Thinking variants (source, source).
In summary: for most Qwen3 models, you can control reasoning per request with enable_thinking, but for Qwen3-VL-Thinking, the model’s template enforces reasoning mode, so disabling it per request is not reliably supported in vLLM as of now (source, source). Would you like more detail or example code?