How to custom end token in vllm serve cli?

I wanna set "\n” as end token like <|im_end|> and <|endoftext|> in vllm serve cli, is there any way to make it?

There is no documented way to set an arbitrary string (such as “””) as an end token (EOS) directly via the vllm serve CLI. vLLM supports specifying stop tokens using the stop or stop_token_ids parameters in API requests, or by modifying the tokenizer configuration files to include special tokens, but adding a new EOS token via CLI is not supported. For custom stop sequences, you should use the stop or stop_token_ids parameter in your API request payload, converting your string to its token ID using the tokenizer if needed. See examples in vision_language.py and related issues.

Would you like more detail on how to convert your string to a token ID and use it in API requests?

Sources:

Hint: Mention @RunLLM in the post for followups.

1 Like

that’s useful, I just need add a parameter stop_token_ids to make it work, like:

from langchain_openai import ChatOpenAI


llm = ChatOpenAI(
    model="bad-woman",
    openai_api_key=vllm-key,
    openai_api_base=inference_server_url,
    max_tokens=1000,
    temperature=0.3,
    extra_body={
        "chat_template_kwargs": {"enable_thinking": False},
        "stop_token_ids": [198]
    }
)

the 198 is the token_id of \n directly from the vocab.json of the tokenizer
emmm, seems I can get that by tokenizer("\n"), this will give me:

{'input_ids': [198], 'attention_mask': [1]}

@RunLLM why vllm uses stop_token_ids instead of openai style stop parameter?

vLLM supports both OpenAI-style stop (string or list of strings) and stop_token_ids (list of token IDs). stop_token_ids is used for precise control, especially when stop strings are special tokens or when string-based stopping fails due to tokenizer behavior (e.g., skip_special_tokens=True). This ensures generation halts exactly at the desired token, which is sometimes not possible with string-based stop alone. See SamplingParams docs and issue #792.

Would you like more detail on the differences and use cases?

Sources:

Hint: Mention @RunLLM in the post for followups.

1 Like