Can reasoning_effort parameter not ne used in vllm implementation via python?

JananiShruthi · January 2, 2026, 10:20am

I am currently using vllm to infer from the model gpt-oss-20b. I want to keep the model’s reasoning ability to low. I read that vllm support a parameter called reasoning_effort. Can I use that parameter if I am using vllm via python code and not server?

RunLLM · January 2, 2026, 10:20am

No, you cannot use the reasoning_effort parameter when running vLLM via the Python LLM class for offline inference; it is only supported through the OpenAI-compatible server endpoints (e.g., /v1/chat/completions or /v1/responses). The reasoning_effort parameter is not available in the SamplingParams or LLM.generate API for direct Python usage—only in HTTP API requests to the server, as confirmed in multiple issues and documentation entries (source1, source2, source3).

If you need to control reasoning_effort, you must run vLLM as a server and send requests via the OpenAI-compatible API. Would you like more detail or example code for using the server API?

Sources:

Hint: Mention @RunLLM in the post for followups.

Topic		Replies	Views
Init DeepSeek-R1 using Offline Batched Inference General	3	245	May 18, 2025
Disabling reasoning of Qwen3-VL-8B-Thinking per request Model Support	1	2757	October 29, 2025
How to pass vLLM specific parameters via openai API from clients General	2	170	June 18, 2025
Which client should I use? General	2	82	December 16, 2025
How to do inference of BGE-m3 embedding with vllm General	1	633	August 27, 2025

Can reasoning_effort parameter not ne used in vllm implementation via python?

Related topics