Custom function-based stopping criteria

I’m wondering if vLLM currently supports something like a function-based stopper, where instead of stopping based on fixed stop strings or token IDs, the generation can be halted dynamically based on custom logic.

For example:

  • An agent generates text.

  • As it generates, I track a budget made up of:

    • Number of tokens generated so far.
    • Cost of external tools the agent has called (e.g., code execution, web search).
  • Once the budget is exhausted, the stopper would end generation immediately.

My questions:

  1. Is there already a way to hook into the generation loop and stop it based on this kind of dynamic condition?
  2. If not, would the best approach?

vLLM does not currently support a function-based stopper that allows dynamic, custom logic for stopping generation (such as tracking a budget or external tool costs). The only supported stopping mechanisms are fixed stop strings (stop), stop token IDs (stop_token_ids), and standard parameters like max_tokens and ignore_eos in SamplingParams—all of which are static and not function-based or dynamically evaluated during generation. There is discussion and user interest in supporting custom stop functions, but this is not implemented due to performance concerns and architectural complexity. The closest workaround is to use custom logit processors, but these operate at the token level and do not provide access to external state or arbitrary logic during generation. See vllm/issue #551.

If you need this functionality, you would need to implement your own generation loop outside vLLM, calling the model step-by-step and applying your custom stopping logic after each step. Would you like more detail on how to implement such a workaround?

Sources: