How to generate just one token?

I wanna let the model generate just one token and give it back, what should I do?
I use openai python lib as client

Set max_tokens=1 in your request to the vLLM server using the OpenAI Python client. This will instruct the model to generate exactly one token in the completion, as supported by vLLM’s OpenAI-compatible API and sampling parameters according to vLLM documentation and example usage.

Example:

from openai import OpenAI

client = OpenAI(api_key="EMPTY", base_url="http://localhost:8000/v1")
response = client.completions.create(
    model="your-model-name",
    prompt="Your prompt here",
    max_tokens=1
)
print(response.choices[0].text)

Would you like more detail?

Sources:

Hint: Mention @RunLLM in the post for followups.