How to custom end token in vllm serve cli?

Septemberlemon · December 4, 2025, 10:17am

I wanna set "\n” as end token like <|im_end|> and <|endoftext|> in vllm serve cli, is there any way to make it?

RunLLM · December 4, 2025, 10:17am

There is no documented way to set an arbitrary string (such as “â”) as an end token (EOS) directly via the vllm serve CLI. vLLM supports specifying stop tokens using the stop or stop_token_ids parameters in API requests, or by modifying the tokenizer configuration files to include special tokens, but adding a new EOS token via CLI is not supported. For custom stop sequences, you should use the stop or stop_token_ids parameter in your API request payload, converting your string to its token ID using the tokenizer if needed. See examples in vision_language.py and related issues.

Would you like more detail on how to convert your string to a token ID and use it in API requests?

Sources:

Hint: Mention @RunLLM in the post for followups.

Septemberlemon · December 4, 2025, 11:28am

that’s useful, I just need add a parameter stop_token_ids to make it work, like:

from langchain_openai import ChatOpenAI


llm = ChatOpenAI(
    model="bad-woman",
    openai_api_key=vllm-key,
    openai_api_base=inference_server_url,
    max_tokens=1000,
    temperature=0.3,
    extra_body={
        "chat_template_kwargs": {"enable_thinking": False},
        "stop_token_ids": [198]
    }
)

the 198 is the token_id of \n directly from the vocab.json of the tokenizer
emmm, seems I can get that by tokenizer("\n"), this will give me:

{'input_ids': [198], 'attention_mask': [1]}

Septemberlemon · December 9, 2025, 10:04am

@RunLLM why vllm uses stop_token_ids instead of openai style stop parameter?

RunLLM · December 9, 2025, 10:04am

vLLM supports both OpenAI-style stop (string or list of strings) and stop_token_ids (list of token IDs). stop_token_ids is used for precise control, especially when stop strings are special tokens or when string-based stopping fails due to tokenizer behavior (e.g., skip_special_tokens=True). This ensures generation halts exactly at the desired token, which is sometimes not possible with string-based stop alone. See SamplingParams docs and issue #792.

Would you like more detail on the differences and use cases?

Sources:

Hint: Mention @RunLLM in the post for followups.

Topic		Replies	Views
How to custom end token? General	2	89	December 4, 2025
How can I determine which specific stop token triggered the termination? General	3	158	December 10, 2025
Tell me about the current status of the tokenize endpoint in vllm General	4	347	December 8, 2025
推理Qwen3-VL-235B-A22B-Instruct-FP8时， General	1	42	December 22, 2025
Raw tokens completion via online serving Features	1	111	November 3, 2025

How to custom end token in vllm serve cli?

Related topics