I wanna set "\n” as end token like <|im_end|> and <|endoftext|>, what should I do?
To set a custom end token (e.g., “â”) as a stop condition in vLLM, add it to the stop list in your SamplingParams or API request. For example, use stop=["â", "<|im_end|>", "<|endoftext|>"] in your SamplingParams, or include it in the stop field of your API payload. This will cause generation to stop when any of these tokens are produced, without needing to modify the tokenizer or model files directly. This approach is confirmed in multiple vLLM discussions for custom and special stop tokens usage, such as for Qwen and Llama models.
Example:
sampling_params = SamplingParams(
temperature=0.8,
top_p=0.95,
max_tokens=2000,
stop=["â", "<|im_end|>", "<|endoftext|>"]
)
Would you like more detail or a full code example?
Sources:
Hint: Mention @RunLLM in the post for followups.
1 Like
is there any way to set that in vllm serve cli?