How to pass vLLM specific parameters via openai API from clients

wangjiyang · June 18, 2025, 3:44am

I noticed there’s a min_tokens parameter in vLLM inference. This sampling parameter is what I needed. But I am serving vLLM as a openai api compatible server, and clients uses openai api to request results. How to pass min_tokens parameter via openai api?

I am calling it with following code snippet and it looks like this parameter is not allowed.

    response = await client.chat.completions.create(
            model="deepseek-reasoner",
            messages=messages,
            stream=True,
            min_tokens = 4096,
    )

TypeError: AsyncCompletions.create() got an unexpected keyword argument 'min_tokens'. Did you mean 'max_tokens'?

RunLLM · June 18, 2025, 3:45am

The OpenAI Python client does not natively support the min_tokens parameter, but vLLM’s OpenAI-compatible server does support it as an extra parameter. To use min_tokens, pass it via the extra_body argument in your request, like this:

response = await client.chat.completions.create(
    model="deepseek-reasoner",
    messages=messages,
    stream=True,
    extra_body={"min_tokens": 4096},
)

This will correctly forward min_tokens to the vLLM server, enabling the minimum token constraint for your request. According to vLLM documentation, extra parameters like min_tokens must be sent this way.

Would you like more detail or a full example?

Sources:

wangjiyang · June 18, 2025, 3:45am

Wow, this agent bot is so smart.

Topic		Replies	Views
Which client should I use? General	2	159	December 16, 2025
Raw tokens completion via online serving Features	1	119	November 3, 2025
The OpenAI endpoint doesn't support function strict setting General	17	1213	August 20, 2025
Can reasoning_effort parameter not ne used in vllm implementation via python? Features	1	389	January 2, 2026
How to generate just one token? General	1	88	December 16, 2025

How to pass vLLM specific parameters via openai API from clients

Related topics