Dose vllm V1 support asynchronous scheduling?

longer-is-better · April 14, 2025, 3:40am

I think it’s still Serial execution of: schedule n → forward n → schedule n+1 → forward n+1 in vllm 0.7.3 version of vllm V1. dose this feature already implemented in vllm v0.7.3 or vllm v0.8.3?

robertshaw · April 14, 2025, 4:09am

It does not, and we do not plan to add it since the scheduler overhead is not too significant for 8B parameter + models. We plan to continue optimizing the scheduler directly (e.g. with C++ bindings) in favor of implementing async scheduling

The core reason for not implementing async scheduling is code complexity + incompatibility with other features like spec decoding

This is of course subject to change but is the current plan

Topic		Replies	Views
Does the vLLM v1 support Speculative Decoding now? V1 Feedback	4	125	March 25, 2025
Is is possible to initialize an AsyncLLMEngine inside the LLM object? verl	4	58	April 12, 2025
Async version of LLM.chat()? General	0	32	March 26, 2025
Why V1 not support Mamba Models? V1 Feedback	1	20	April 21, 2025
Pipeline Parallelism Support - Source Code Location Features	1	50	March 25, 2025

Dose vllm V1 support asynchronous scheduling?

Related topics