Question about schedule with V1 pipeline parallelsim

wZuck · July 17, 2025, 6:44am

If i start a server with --pipeline-parallel-size 2, if a requset req_a is scheduled in batch_0, when scheduling batch_1, what status the req_a is? RequestStatus.WAITING or RequestStatus.RUNNING or otherwise?

RunLLM · July 17, 2025, 6:45am

When using --pipeline-parallel-size 2, if request req_a is scheduled in batch_0, then while scheduling batch_1, the status of req_a is RequestStatus.RUNNING. This is because once a request is scheduled and being processed in the pipeline, it remains in the RUNNING state until it is either finished or preempted; it is not returned to WAITING between pipeline stages or batches. This behavior is confirmed by the scheduler logic and test cases, which show that requests remain in the RUNNING state while they are being processed, even as new batches are scheduled for pipeline parallelism (test_scheduler.py, request.py).

Would you like a step-by-step breakdown of the scheduling and status transitions in pipeline parallelism?

Sources:

wZuck · July 17, 2025, 7:08am

will req_a be scheduled into batch_1?

RunLLM · July 17, 2025, 7:08am

No, req_a will not be scheduled into batch_1 if it is already scheduled in batch_0 and is still being processed. In vLLM’s pipeline parallelism, each request is scheduled into only one batch at a time; it remains in the RUNNING state and is not rescheduled into another batch until its current processing is complete or unless special chunked prefill logic is enabled (test_engine_core.py, test_scheduler.py).

Would you like more detail on how chunked prefill or advanced scheduling policies might affect this behavior?

Sources:

wZuck · July 17, 2025, 7:09am

how vllm prevent req_a being scheduled into batch_1?

RunLLM · July 17, 2025, 7:10am

vLLM prevents req_a from being scheduled into batch_1 by maintaining a set of scheduled requests in the scheduler. Once a request is scheduled and placed in the RUNNING state, it is tracked so the scheduler does not select it again for a new batch until its current processing is finished. This is implemented using a set (e.g., scheduled_req_ids) to ensure requests are not rescheduled before their outputs are returned ([V1][PP] Run engine busy loop with batch queue]([V1][PP] Run engine busy loop with batch queue by comaniac · Pull Request #13064 · vllm-project/vllm · GitHub)).

Would you like a step-by-step explanation of how the scheduler enforces this constraint?

Sources:

https://github.com/vllm-project/vllm/pull/13064

Topic		Replies	Views
V1 Chunked Prefill Scheduling Policy: how prefill would be scheduled? Scheduling	8	376	March 25, 2025
Num request running stays on 1 Benchmarking	3	65	August 29, 2025
Scheduler in vllm Features	1	148	June 26, 2025
VLLM V1 Scheduler: Inconsistent Request Scheduling Under Token Budget Limit General	1	70	July 30, 2025
Dose vllm V1 support asynchronous scheduling? V1 Feedback	3	561	June 27, 2025

Question about schedule with V1 pipeline parallelsim

Related topics