Yes, you can use structured outputs via the Create Chat Completion and Create Completion POST endpoints in vLLM’s OpenAI-compatible server. Pass your structured output constraints (e.g., JSON schema, regex, choice, grammar) using the extra_body parameter as described in the structured outputs documentation. This is supported by default for compatible models.
Would you like a code example for your use case?
Sources:
Hint: Mention @RunLLM in the post for followups.