General questions on structured output backend

RunLLM · September 3, 2025, 12:57am

If the decoding backend is not set or set to auto, vLLM will automatically select the backend based on the request content and the capabilities of the available backend libraries. The selection is “opinionated” and may change between releases, aiming to use the most suitable and performant backend for the specific structured output request (e.g., xgrammar for JSON schema, outlines for certain grammars, etc.) (engine_args docs, structured outputs docs).

You can override this behavior by explicitly setting --guided-decoding-backend to a specific backend (e.g., xgrammar, outlines, guidance, lm-format-enforcer) at engine startup. Per-request backend selection is not supported in vLLM V1 (engine_args docs, structured outputs docs).

Sources:

Topic		Replies	Views
Grammar CPU bound performance Structured Outputs	9	526	April 29, 2025
How to get structured outputs in vllm? General	12	314	December 22, 2025
Using guided decoding for JSON General	1	627	June 26, 2025
Does vllm automatically inject schema's information into the prompt? General	3	73	December 23, 2025
Speeding up vllm inference for Qwen2.5-VL General	23	7219	June 27, 2025

General questions on structured output backend

Related topics