General questions on structured output backend

If the decoding backend is not set or set to auto, vLLM will automatically select the backend based on the request content and the capabilities of the available backend libraries. The selection is “opinionated” and may change between releases, aiming to use the most suitable and performant backend for the specific structured output request (e.g., xgrammar for JSON schema, outlines for certain grammars, etc.) (engine_args docs, structured outputs docs).

You can override this behavior by explicitly setting --guided-decoding-backend to a specific backend (e.g., xgrammar, outlines, guidance, lm-format-enforcer) at engine startup. Per-request backend selection is not supported in vLLM V1 (engine_args docs, structured outputs docs).

Sources: