Init DeepSeek-R1 using Offline Batched Inference

Hi all, I am following this guide to run vllm using python. However, as I understand for deepseek models, we also need “–enable-reasoning --reasoning-parser deepseek_r1” as this guide. I am just wondering how to enable reasoning with the python API?

To enable reasoning for DeepSeek models using the vLLM Python API, you must start the vLLM server with the --reasoning-parser deepseek_r1 flag (the --enable-reasoning flag is now deprecated and not required). The Python client itself does not control reasoning; it simply connects to the server, which must be started with the appropriate reasoning parser enabled. There is no Python API parameter to enable reasoning—this is a server-side configuration only.

For example, launch the server with:

vllm serve deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B --reasoning-parser deepseek_r1

Then, use the OpenAI-compatible Python client as usual. The server will return outputs with the reasoning_content field for supported models. See Reasoning Outputs documentation and example code for details.

Would you like more detail or a full code example?

Sources:

how to do this with native python API of vllm, not OpenAI API?

There is no documented way to enable reasoning outputs using the native vLLM Python API (LLM class or similar); reasoning support is only available via the OpenAI-compatible server interface, not the direct Python API. The native API does not expose parameters for reasoning or reasoning parsers, and all official examples and documentation use the OpenAI API interface for this feature (see docs, see example).

If you need reasoning outputs, you must use the OpenAI-compatible HTTP server and client. The native Python API does not support this as of the latest documentation.

Would you like more detail or a workaround suggestion?

Sources: