Specifying special tokens

g-eoj · May 8, 2025, 8:13pm

Are there any known methods for “guiding” a model to output a special token (like </think>) rather than having it select tokens for substrings of the special token?

RunLLM · May 8, 2025, 8:14pm

Yes, to guide a model to output a special token (e.g., “”) as an atomic unit rather than as substrings, you must ensure the token is defined as a special token in the tokenizer vocabulary. If the tokenizer does not treat it as a special token, the model may generate it as multiple sub-tokens (e.g., “<th”, “ink”, “>”). This issue has been observed, for example, with Qwen3-FP8 models, where special tokens were split unless properly registered in the tokenizer config. Using a “slow” tokenizer or updating the tokenizer config to include the special token can help, but the root cause is tokenizer vocabulary alignment, not vLLM logic itself. See discussion in vllm/issues/17468.

Would you like more detail on how to update the tokenizer or handle this in vLLM?

Sources:

vllm/issues/17468

g-eoj · May 8, 2025, 8:16pm

Not a solution. The linked issue is not related to guided decoding. It does describe how special tokens get split sometimes.

RunLLM · May 8, 2025, 8:17pm

You’re correct—the linked issue is about special token splitting, not guided decoding. For actual “guided decoding” (forcing the model to output a specific token at a certain point), vLLM supports custom logits processors, which can manipulate the logits to strongly favor a special token when needed. For example, you can implement a logits processor that, after a certain number of tokens or under specific conditions, sets all logits to -inf except for the special token, effectively forcing its generation. See the example in vllm/issues/15418.

Would you like a code example of such a logits processor for guided special token output?

Sources:

vllm/issues/15418

mmoskal · May 8, 2025, 10:59pm

The guidance backend has dedicated support for special tokens, see llguidance/docs/syntax.md at main · guidance-ai/llguidance · GitHub

I believe xgrammar treats them as text.

RunLLM · May 8, 2025, 11:00pm

You are correct: the guidance backend (llguidance) provides explicit syntax for special tokens (e.g., <|special_token|>) in grammars, allowing you to guide generation to output them as atomic units. In contrast, xgrammar currently treats special tokens as plain text and does not have dedicated support for atomic special token handling in its grammar format. See llguidance special tokens docs and xgrammar discussion.

Would you like more detail on how to use special tokens with the guidance backend in vLLM?

Sources:

xgrammar discussion

Topic		Replies	Views
Why remove bonus token of requset in draft model? Speculative Decoding	0	29	March 30, 2025
Irrelevant Responses with Unsloth Fine-tuned Llama 3.1 8B using vLLM General	3	66	April 10, 2025
Is guided decoding used during automatic function calling (--enable-auto-tool-choice)? General	1	12	May 27, 2025
Is structured output compatible with automatic prefix caching? Structured Outputs	1	44	April 14, 2025
Goodput Guided Speculative Decoding Speculative Decoding	2	108	April 19, 2025

Specifying special tokens

Related topics