Dose vllm V1 support asynchronous scheduling?

I think it’s still Serial execution of: schedule n → forward n → schedule n+1 → forward n+1 in vllm 0.7.3 version of vllm V1. dose this feature already implemented in vllm v0.7.3 or vllm v0.8.3?

It does not, and we do not plan to add it since the scheduler overhead is not too significant for 8B parameter + models. We plan to continue optimizing the scheduler directly (e.g. with C++ bindings) in favor of implementing async scheduling

The core reason for not implementing async scheduling is code complexity + incompatibility with other features like spec decoding

This is of course subject to change but is the current plan

1 Like