How does VLLM handle jsons for guided prompting

NilsHellwig · June 24, 2025, 6:27pm

Hey,

I would like to know if a json schema added via GuidedDecodingParams is automatically added to the prompt? And in this context: Does “description” do anything? Will it be handed over to the prompt?

class ReviewMetaProperties(BaseModel):
    properties: List[str] = Field(..., min_length=10, max_length=10, description="A list of exactly 10 unique properties that are relevant for a review persona on TripAdvisor.")

Best regards,

Nils

RunLLM · June 24, 2025, 6:27pm

No, a JSON schema provided via GuidedDecodingParams (e.g., using the json argument) is not automatically added to the prompt. The schema is only used to guide the model’s output structure, not to inform the model via the prompt text. The description field in a Pydantic schema (or JSON schema) is not injected into the prompt either; it is used for schema validation and documentation, not for prompt engineering or model guidance. You must manually include any schema details or descriptions you want the model to see in the prompt itself. See docs example and protocol source.

Would you like more detail or a code example?

Sources:

NilsHellwig · June 24, 2025, 6:34pm

NilsHellwig:

Hey,

I would like to know if a json schema added via GuidedDecodingParams is automatically added to the prompt? And in this context: Does “description” do anything? Will it be handed over to the prompt?
class ReviewMetaProperties(BaseModel):
    properties: List[str] = Field(..., min_length=10, max_length=10, description="A list of exactly 10 unique properties that are relevant for a review persona on TripAdvisor.")
Best regards,

Nils

how can i set individual guided decoding params for each prompt in the prompts array

RunLLM · June 24, 2025, 6:35pm

You cannot set individual GuidedDecodingParams for each prompt in a single batched call; vLLM currently only supports a single set of guided decoding parameters per batch. This limitation is confirmed in this discussion, where the maintainers suggest sending prompts one by one or grouping prompts that share the same guided decoding parameters.

Would you like more detail or a code workaround?

Sources:

this discussion

NilsHellwig · June 24, 2025, 6:37pm

which code do i need to change to allow it however. would be prefered way but if its really impossible (im pretty sure it is possible) show me a workaround

RunLLM · June 24, 2025, 6:37pm

To support individual GuidedDecodingParams per prompt in a batch, you would need to modify the batching and request handling logic so that each prompt can carry its own SamplingParams (with its own GuidedDecodingParams). Currently, vLLM expects a single SamplingParams object for all prompts in a batch, as seen in the sampling API and confirmed in this issue. This would require changes in the batching logic, the generate method, and possibly the backend model runner to accept and process a list of SamplingParams objects.

As a workaround, you can loop over your prompts and call llm.generate() separately for each prompt with its own SamplingParams, as suggested in this comment.

Would you like a code example for the workaround or more detail on what to change in the codebase?

Sources:

NilsHellwig · June 24, 2025, 6:39pm

show me all parts of the package i need to change for individual GuidedDecodingParams

RunLLM · June 24, 2025, 6:39pm

It seems likely that to support individual GuidedDecodingParams per prompt, you would need to change several core parts of vLLM:

LLM.generate API: Update the generate method to accept a list of SamplingParams (each with its own GuidedDecodingParams) and ensure the list is paired with the prompts. See the LLM.generate signature and llm.py source.
Batching Logic: Modify batching so that each prompt in a batch can carry its own SamplingParams. This likely involves changes in the request queueing and scheduling logic.
Model Runner: Ensure the backend model runner and sampling logic can handle a list of SamplingParams and apply the correct guided decoding for each prompt. This may require changes in the sampling and decoding modules.

This is not currently supported and would require non-trivial changes throughout the codebase, as discussed in this issue.

Would you like a step-by-step breakdown or a code example for the workaround?

Sources:

NilsHellwig · June 25, 2025, 9:43am

This is what the code looks like:

from vllm import LLM, SamplingParams
from vllm.sampling_params import GuidedDecodingParams
from pydantic import BaseModel, Field, create_model
from typing import Literal, List, Optional
import json

llm = LLM(model="google/gemma-3-27b-it", quantization="bitsandbytes", trust_remote_code=True, max_model_len=2048, max_num_seqs=100)

TYPE = "review on airlines posted on TripAdvisor"

json_schema = {'properties': {'gender': {'enum': ['male', 'female', 'non-binary', 'other', 'prefer_not_to_say'], 'title': 'Gender', 'type': 'string'}, 'prename': {'title': 'Prename', 'type': 'string'}, 'surname': {'title': 'Surname', 'type': 'string'}, 'age': {'maximum': 120, 'minimum': 0, 'title': 'Age', 'type': 'integer'}, 'overall_mood': {'enum': ['joy', 'trust', 'fear', 'surprise', 'sadness', 'disgust', 'anger', 'anticipation'], 'title': 'Overall Mood', 'type': 'string'}, 'persona_description': {'title': 'Persona Description', 'type': 'string'}, 'review_title': {'title': 'Review Title', 'type': 'string'}, 'travel_class': {'title': 'Travel Class', 'type': 'string'}, 'trip_type': {'title': 'Trip Type', 'type': 'string'}, 'number_of_passengers': {'title': 'Number Of Passengers', 'type': 'string'}, 'flight_distance': {'title': 'Flight Distance', 'type': 'string'}, 'reason_for_travel': {'title': 'Reason For Travel', 'type': 'string'}, 'airline_loyalty_program_member': {'title': 'Airline Loyalty Program Member', 'type': 'string'}, 'previous_airline_experience': {'title': 'Previous Airline Experience', 'type': 'string'}, 'booking_method': {'title': 'Booking Method', 'type': 'string'}, 'meal_preference': {'title': 'Meal Preference', 'type': 'string'}, 'entertainment_preference': {'title': 'Entertainment Preference', 'type': 'string'}}, 'required': ['gender', 'prename', 'surname', 'age', 'overall_mood', 'persona_description', 'review_title', 'travel_class', 'trip_type', 'number_of_passengers', 'flight_distance', 'reason_for_travel', 'airline_loyalty_program_member', 'previous_airline_experience', 'booking_method', 'meal_preference', 'entertainment_preference'], 'title': 'ReviewMeta', 'type': 'object'}

guided_decoding_params = GuidedDecodingParams(json=json_schema)

output = llm.generate(
    [f"Return the metadata of a person who writes a {TYPE}: "],
    sampling_params=SamplingParams(
        temperature=0.8,
        max_tokens=8192,
        seed=2,
        guided_decoding=guided_decoding_params)
)
authors = output[0].outputs[0].text
# convert author to object
import json
authors = json.loads(authors)

Output:

---------------------------------------------------------------------------
JSONDecodeError                           Traceback (most recent call last)
Cell In[7], line 14
     12 # convert author to object
     13 import json
---> 14 authors = json.loads(authors)

File ~/miniconda3/envs/llm_tuner/lib/python3.10/json/__init__.py:346, in loads(s, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
    341     s = s.decode(detect_encoding(s), 'surrogatepass')
    343 if (cls is None and object_hook is None and
    344         parse_int is None and parse_float is None and
    345         parse_constant is None and object_pairs_hook is None and not kw):
--> 346     return _default_decoder.decode(s)
    347 if cls is None:
    348     cls = JSONDecoder

File ~/miniconda3/envs/llm_tuner/lib/python3.10/json/decoder.py:337, in JSONDecoder.decode(self, s, _w)
    332 def decode(self, s, _w=WHITESPACE.match):
    333     """Return the Python representation of ``s`` (a ``str`` instance
    334     containing a JSON document).
    335 
    336     """
--> 337     obj, end = self.raw_decode(s, idx=_w(s, 0).end())
    338     end = _w(s, end).end()
    339     if end != len(s):

File ~/miniconda3/envs/llm_tuner/lib/python3.10/json/decoder.py:353, in JSONDecoder.raw_decode(self, s, idx)
    344 """Decode a JSON document from ``s`` (a ``str`` beginning with
    345 a JSON document) and return a 2-tuple of the Python
    346 representation and the index in ``s`` where the document ended.
   (...)
    350 
    351 """
    352 try:
--> 353     obj, end = self.scan_once(s, idx)
    354 except StopIteration as err:
    355     raise JSONDecodeError("Expecting value", s, err.value) from None

JSONDecodeError: Expecting ',' delimiter: line 984 column 1 (char 5140)

Output of output[0].outputs[0].text:

'{ "gender": "female", "prename": "Laura", "surname": "Smith", "age": 35, "overall_mood": "disgust", "persona_description": "A meticulous and independent traveler who values comfort and reliability. She often travels for leisure."\n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n    \n'

RunLLM · June 25, 2025, 9:43am

Your error is caused by the model outputting incomplete or invalid JSON—specifically, the output is truncated or padded with many newlines, so json.loads() fails. This is a known limitation of vLLM’s guided decoding: it does not guarantee the output will always be valid JSON, especially for long or complex schemas, or when max_tokens is large. See vLLM issue #6905 and vLLM issue #20060.

A common workaround is to post-process the output to repair or retry JSON parsing, for example using the json_repair library, or to check and trim the output before parsing. You can also try reducing max_tokens or making your schema stricter, but this does not fully solve the problem.

Would you like a code example for robust JSON post-processing?

Sources:

Topic		Replies	Views
Using guided decoding for JSON General	1	11	June 26, 2025
Tool calling using Offline Inference? Tool Calling	1	43	April 14, 2025
Prompt_embeds usage in vllm openai completion api Multi-modality	4	24	June 17, 2025
Is guided decoding used during automatic function calling (--enable-auto-tool-choice)? General	1	12	May 27, 2025
Irrelevant Responses with Unsloth Fine-tuned Llama 3.1 8B using vLLM General	3	69	April 10, 2025

How does VLLM handle jsons for guided prompting

Related topics