How to configure the parameters of EPLB？

cris · August 18, 2025, 8:38am

I deployed Qwen3-235B-A22B-FPB on 8*H20-141G. Initially, I used the parallel strategy of --tensor-parallel-size 8 --enable-expert-parallel. Through nsys profiling, I found that there was a serious desync problem in nccl allreduce, which was caused by the unbalanced expert routing.
I noticed that vllm v0.10.1 added EPLB support in Qwen3-moe. So I tried the parameters --tensor-parallel-size 8 --enable-expert-parallel --enable-eplb --num_redundant_experts 16 --eplb_window_size 100 --eplb_step_interval 100. However, the performance deteriorated.
The command is: vllm serve /data/models/Qwen/Qwen3-235B-A22B-Instruct-2507-FP8 --served-model-name “Qwen3-235B-A22B-Instruct-2507-FP8” --max-model-len 16384 --trust-remote-code --gpu-memory-utilization 0.96 --tensor-parallel-size 8 --enable-expert-parallel --enable-eplb --num_redundant_experts 16 --eplb_window_size 100 --eplb_step_interval 100 --port 8112

Could you please advise on how to configure the EPLB parameters in the deployment of Qwen3-MOE to achieve better performance?

RunLLM · August 18, 2025, 8:39am

For Qwen3-MoE with EPLB, vLLM recommends using a larger --eplb-window-size (e.g., 1000) and a larger --eplb-step-interval (e.g., 3000) to reduce overhead, as frequent rebalancing can degrade performance. Also, for large-scale deployments, set --num-redundant-experts to 32 to ensure popular experts are always available. Your current settings (--eplb-window-size 100 --eplb-step-interval 100) are too aggressive and likely cause excessive synchronization and communication overhead, leading to worse performance.

Try the following configuration for better load balancing and throughput:

vllm serve /data/models/Qwen/Qwen3-235B-A22B-Instruct-2507-FP8 \
  --served-model-name Qwen3-235B-A22B-Instruct-2507-FP8 \
  --max-model-len 16384 \
  --trust-remote-code \
  --gpu-memory-utilization 0.96 \
  --tensor-parallel-size 8 \
  --enable-expert-parallel \
  --enable-eplb \
  --num_redundant_experts 32 \
  --eplb-window-size 1000 \
  --eplb-step-interval 3000 \
  --port 8112

Would you like a step-by-step explanation of why these values are recommended?

Sources:

vLLM Expert Parallel Deployment Guide

cris · August 18, 2025, 2:24pm

Thank you for your reply.
I used --enable-eplb
–num_redundant_experts 16
–eplb-window-size 1000
–eplb-step-interval 3000
The above parameters have improved in performance compared to before, but they have not surpassed the performance of disable eplb. This is the 8* H20-141GB environment. I think the value of num_redundant_experts = 32 is too large. So i have chosen 16 instead.

RunLLM · August 18, 2025, 2:25pm

Your observation is valid: while EPLB can improve load balance, it may introduce overhead, especially if the number of redundant experts is too high for your hardware or workload. The optimal value for --num-redundant-experts depends on the model, traffic pattern, and GPU memory; 16 is reasonable if 32 causes memory or performance issues. However, in some cases, disabling EPLB may still yield better throughput due to reduced synchronization and weight transfer overhead, especially on high-bandwidth, low-latency clusters or when expert imbalance is not severe.

If EPLB does not outperform the baseline, you can try further tuning: reduce --num-redundant-experts (e.g., 8 or 12), increase --eplb-step-interval (e.g., 5000), or disable --eplb-log-balancedness to minimize logging overhead. Ultimately, the best configuration is workload- and hardware-dependent, and some environments may benefit more from static expert assignment. For more details, see Expert Parallel Load Balancer (EPLB) documentation.

Would you like a step-by-step breakdown of how to tune these parameters for your specific setup?

Sources:

vLLM Expert Parallel Deployment Guide

Topic		Replies	Views
EPLB behavior in elastic scaling Features	21	282	November 28, 2025
为什么ep的 size=tp*dp General	8	181	November 25, 2025
Deployment parameters for qwen3.5-4b? General	22	849	April 19, 2026
High-Throughput kernel on single-node Benchmarking	1	203	June 23, 2025
Critique my vLLM configuration for qwen3-coder-next Model Support	3	230	March 10, 2026

How to configure the parameters of EPLB？

Related topics