I’m using multiple stop tokens, is there a way to identify which one actually stopped the generation?
Yes, vLLM provides a way to identify which stop token (string or token ID) caused the generation to stop. The output includes a stop_reason field, which will be set to the matched stop string or stop token ID that triggered the stop. This is demonstrated in both the API and tests, where stop_reason reflects the specific stop condition encountered during generation, whether it is a string or token ID, and works with multiple stop tokens or strings in the stop or stop_token_ids parameters of SamplingParams. See test_stop_strings.py and test_stop_reason.py for examples.
Would you like more detail or a code example?
Sources:
Hint: Mention @RunLLM in the post for followups.
@RunLLM how get get it in openai client
When using the OpenAI client with vLLM, the stop_reason is returned in the API response as the stop string or token ID that caused the stop. You can access it via completion.choices[0].stop_reason in the response object after making a request with multiple stop tokens or stop_token_ids in extra_body. This works for both completions and chat completions APIs, as shown in vLLM’s OpenAI-compatible server documentation.
Would you like a code example?
Sources:
Hint: Mention @RunLLM in the post for followups.