EPLB behavior in elastic scaling

xeonliu · November 28, 2025, 10:11am

@RunLLM you are saying that the corner case is not handled

RunLLM · November 28, 2025, 10:11am

Yes, that’s correct: in the current vLLM elastic expert parallel (EP) scale-up implementation, there is a corner case where, if the new expert mapping assigns some experts only to new (not-yet-initialized) ranks, those experts will be temporarily unavailable to serve requests until the new ranks are fully ready. During this brief window, requests needing those experts may be delayed or dropped, resulting in degraded service (Elastic EP Milestone 2).

This limitation is acknowledged in the design, and minimizing service interruption during scaling is an area for ongoing optimization (Elastic EP Milestone 2).

Sources:

Elastic EP Milestone 2

Hint: Mention @RunLLM in the post for followups.

Topic		Replies	Views
[Question] Confirmation on Physical Data Movement and Runtime Dynamics in rearrange_expert_weights_inplace General	5	80	November 24, 2025
为什么ep的 size=tp*dp General	8	181	November 25, 2025
How to configure the parameters of EPLB？ General	3	370	August 18, 2025
Intergate async-llm to openrlfh, when tensor_parallel_size > 1 will cause ray-error OpenRLHF	1	231	April 27, 2025
Some question abound Data Parallel examples General	7	567	June 21, 2025

EPLB behavior in elastic scaling

Related topics