EPLB behavior in elastic scaling

@RunLLM you are saying that the corner case is not handled

Yes, that’s correct: in the current vLLM elastic expert parallel (EP) scale-up implementation, there is a corner case where, if the new expert mapping assigns some experts only to new (not-yet-initialized) ranks, those experts will be temporarily unavailable to serve requests until the new ranks are fully ready. During this brief window, requests needing those experts may be delayed or dropped, resulting in degraded service (Elastic EP Milestone 2).

This limitation is acknowledged in the design, and minimizing service interruption during scaling is an area for ongoing optimization (Elastic EP Milestone 2).

Sources:

Hint: Mention @RunLLM in the post for followups.