How to pass vLLM specific parameters via openai API from clients

I noticed there’s a min_tokens parameter in vLLM inference. This sampling parameter is what I needed. But I am serving vLLM as a openai api compatible server, and clients uses openai api to request results. How to pass min_tokens parameter via openai api?

I am calling it with following code snippet and it looks like this parameter is not allowed.

    response = await client.chat.completions.create(
            model="deepseek-reasoner",
            messages=messages,
            stream=True,
            min_tokens = 4096,
    )

TypeError: AsyncCompletions.create() got an unexpected keyword argument 'min_tokens'. Did you mean 'max_tokens'?

The OpenAI Python client does not natively support the min_tokens parameter, but vLLM’s OpenAI-compatible server does support it as an extra parameter. To use min_tokens, pass it via the extra_body argument in your request, like this:

response = await client.chat.completions.create(
    model="deepseek-reasoner",
    messages=messages,
    stream=True,
    extra_body={"min_tokens": 4096},
)

This will correctly forward min_tokens to the vLLM server, enabling the minimum token constraint for your request. According to vLLM documentation, extra parameters like min_tokens must be sent this way.

Would you like more detail or a full example?

Sources:

Wow, this agent bot is so smart.