Question about schedule with V1 pipeline parallelsim

If i start a server with --pipeline-parallel-size 2, if a requset req_a is scheduled in batch_0, when scheduling batch_1, what status the req_a is? RequestStatus.WAITING or RequestStatus.RUNNING or otherwise?

When using --pipeline-parallel-size 2, if request req_a is scheduled in batch_0, then while scheduling batch_1, the status of req_a is RequestStatus.RUNNING. This is because once a request is scheduled and being processed in the pipeline, it remains in the RUNNING state until it is either finished or preempted; it is not returned to WAITING between pipeline stages or batches. This behavior is confirmed by the scheduler logic and test cases, which show that requests remain in the RUNNING state while they are being processed, even as new batches are scheduled for pipeline parallelism (test_scheduler.py, request.py).

Would you like a step-by-step breakdown of the scheduling and status transitions in pipeline parallelism?

Sources:

will req_a be scheduled into batch_1?

No, req_a will not be scheduled into batch_1 if it is already scheduled in batch_0 and is still being processed. In vLLM’s pipeline parallelism, each request is scheduled into only one batch at a time; it remains in the RUNNING state and is not rescheduled into another batch until its current processing is complete or unless special chunked prefill logic is enabled (test_engine_core.py, test_scheduler.py).

Would you like more detail on how chunked prefill or advanced scheduling policies might affect this behavior?

Sources:

how vllm prevent req_a being scheduled into batch_1?

vLLM prevents req_a from being scheduled into batch_1 by maintaining a set of scheduled requests in the scheduler. Once a request is scheduled and placed in the RUNNING state, it is tracked so the scheduler does not select it again for a new batch until its current processing is finished. This is implemented using a set (e.g., scheduled_req_ids) to ensure requests are not rescheduled before their outputs are returned ([V1][PP] Run engine busy loop with batch queue]([V1][PP] Run engine busy loop with batch queue by comaniac · Pull Request #13064 · vllm-project/vllm · GitHub)).

Would you like a step-by-step explanation of how the scheduler enforces this constraint?

Sources:

1 Like