Are there any known methods for “guiding” a model to output a special token (like </think>
) rather than having it select tokens for substrings of the special token?
Yes, to guide a model to output a special token (e.g., “”) as an atomic unit rather than as substrings, you must ensure the token is defined as a special token in the tokenizer vocabulary. If the tokenizer does not treat it as a special token, the model may generate it as multiple sub-tokens (e.g., “<th”, “ink”, “>”). This issue has been observed, for example, with Qwen3-FP8 models, where special tokens were split unless properly registered in the tokenizer config. Using a “slow” tokenizer or updating the tokenizer config to include the special token can help, but the root cause is tokenizer vocabulary alignment, not vLLM logic itself. See discussion in vllm/issues/17468.
Would you like more detail on how to update the tokenizer or handle this in vLLM?
Sources:
Not a solution. The linked issue is not related to guided decoding. It does describe how special tokens get split sometimes.
You’re correct—the linked issue is about special token splitting, not guided decoding. For actual “guided decoding” (forcing the model to output a specific token at a certain point), vLLM supports custom logits processors, which can manipulate the logits to strongly favor a special token when needed. For example, you can implement a logits processor that, after a certain number of tokens or under specific conditions, sets all logits to -inf except for the special token, effectively forcing its generation. See the example in vllm/issues/15418.
Would you like a code example of such a logits processor for guided special token output?
Sources:
The guidance backend has dedicated support for special tokens, see llguidance/docs/syntax.md at main · guidance-ai/llguidance · GitHub
I believe xgrammar treats them as text.
You are correct: the guidance backend (llguidance) provides explicit syntax for special tokens (e.g., <|special_token|>
) in grammars, allowing you to guide generation to output them as atomic units. In contrast, xgrammar currently treats special tokens as plain text and does not have dedicated support for atomic special token handling in its grammar format. See llguidance special tokens docs and xgrammar discussion.
Would you like more detail on how to use special tokens with the guidance backend in vLLM?
Sources: