How to get structured outputs in vllm?

Yes, you can use structured outputs via the Create Chat Completion and Create Completion POST endpoints in vLLM’s OpenAI-compatible server. Pass your structured output constraints (e.g., JSON schema, regex, choice, grammar) using the extra_body parameter as described in the structured outputs documentation. This is supported by default for compatible models.

Would you like a code example for your use case?

Sources:

Hint: Mention @RunLLM in the post for followups.