How to add custom special tokens?

Septemberlemon · December 1, 2025, 8:16am

I wanna add some custom special tokens, as far as I know, it was stored in tokenizer.json and tokenizer_config.json, the former is used by the low-level rust tokenizer, the latter is used by python-level’s tokenizer.all_special_tokens
which file was used when vllm inferencing, which file should I modify?

RunLLM · December 1, 2025, 8:16am

To add custom special tokens for vLLM inference, you should modify both tokenizer.json (for the underlying tokenizer) and tokenizer_config.json (for Python-level access like tokenizer.all_special_tokens). vLLM loads the tokenizer using Hugging Face’s AutoTokenizer.from_pretrained, which reads both files, so changes in both are respected during inference. See details in vLLM HuggingFace integration docs and tokenizer loading code.

Would you like more detail on how to safely add tokens to both files?

Sources:

Hint: Mention @RunLLM in the post for followups.

hackyroot · December 2, 2025, 9:08am

Even though you are modifying the tokenizer_config.json and tokenizer.json files, the model will not pickup those tokens as it wasn’t trained on those custom tokens. Hence it won’t generate those tokens.

Septemberlemon · December 3, 2025, 3:29am

I’ve added the new token and added its embedding
now I don’t know how to add only new embedding into the modules that will to be finetuned
I use unsloth

Topic		Replies	Views
Specifying special tokens Structured Outputs	5	526	May 8, 2025
How to custom end token in vllm serve cli? General	4	114	December 9, 2025
Tell me about the current status of the tokenize endpoint in vllm General	4	157	December 8, 2025
How to custom end token? General	2	44	December 4, 2025
How to improve tokenization speed for embeddings generation? General	1	602	August 28, 2025

How to add custom special tokens?

Related topics