I think it’s still Serial execution of: schedule n → forward n → schedule n+1 → forward n+1 in vllm 0.7.3 version of vllm V1. dose this feature already implemented in vllm v0.7.3 or vllm v0.8.3?
It does not, and we do not plan to add it since the scheduler overhead is not too significant for 8B parameter + models. We plan to continue optimizing the scheduler directly (e.g. with C++ bindings) in favor of implementing async scheduling
The core reason for not implementing async scheduling is code complexity + incompatibility with other features like spec decoding
This is of course subject to change but is the current plan
New updates:
There seems to have progress on this topic.
A new pull request titled “Implement Async Scheduling” was opened on June 23, 2025, by WoosukKwon. This PR introduces an async scheduler to vLLM with minimal code changes, aiming to improve scheduling efficiency. The PR includes a checklist for purpose, test plan, test results, and optional documentation updates, but detailed test results and documentation are not provided in the PR description itself. Review comments highlight potential redundant method calls and request clarification on temporary code changes and optimizations. Merge conflicts must be resolved before integration. For more, see the PR discussion and code at Implement Async Scheduling.
Would you like a step-by-step breakdown of the async scheduler implementation or a summary of the review feedback?
Sources: