VLLM V1 Scheduler: Inconsistent Request Scheduling Under Token Budget Limit

RunLLM · July 30, 2025, 7:27am

The inconsistent scheduling you observe—where not all requests are scheduled in the first round even though the total token requirement exactly matches the token budget—is a known issue in vLLM V1’s scheduler logic. This is not intended behavior and is due to the way the scheduler iterates through the waiting queue and handles token budget checks, especially when chunked prefill is not enabled. The scheduler may break out of the scheduling loop early if it encounters a request that cannot be scheduled (e.g., due to a strict num_new_tokens > token_budget check), rather than continuing to try the next requests, leading to underutilization of the available token budget and variable batch sizes per step. This is confirmed by the code and by similar issues reported by users (see related discussion, see scheduler code).

Specifically, in the WAITING requests scheduling loop, if a request’s num_new_tokens exceeds the remaining token_budget and chunked prefill is not enabled, the scheduler breaks out of the loop instead of skipping that request and trying the next one. This causes the observed inconsistent and suboptimal scheduling, especially when the sum of all requests fits the budget but individual requests may not, or when the order of requests in the queue varies. This is a known limitation and has been discussed as a source of confusion and inefficiency (see issue, see code).

Would you like a step-by-step breakdown of the relevant code paths and suggestions for possible workarounds or patches?

Sources:

Topic		Replies	Views
Num request running stays on 1 Benchmarking	3	323	August 29, 2025
一个长输入的请求，切chunk ，比如切了4份，prefill的时候，这四个可以同时做prefill 吗，还是有依赖关系的 General	15	356	January 21, 2026
Scheduler in vllm Features	1	362	June 26, 2025
什么情况下，一个请求会被重复schedule? General	58	1063	January 6, 2026
VLLM 0.10.1 benchmark do not free memory Benchmarking	13	235	November 10, 2025

VLLM V1 Scheduler: Inconsistent Request Scheduling Under Token Budget Limit

Related topics