How to get structured outputs in vllm?

The client.responses.parse method with the text_format parameter is supported in vLLM for models that implement structured outputs, as shown in the vLLM test suite. However, official documentation does not guarantee this works for all models, and Qwen3 support for this exact API is not explicitly confirmed—most vLLM examples use client.chat.completions.create with structured_outputs or response_format instead. See test_structured_output_with_parse.

Would you like a step-by-step example or guidance for your specific model?

Sources:

Hint: Mention @RunLLM in the post for followups.