Dp 8启动，没有使能 --enable_expert_parallel，为什么还有 ep rank的打印

yangshanjun · November 24, 2025, 7:35am

[parallel_state.py:1083] rank 0 in world size 8 is assigned as DP rank 0, PP rank 0, TP rank 0, EP rank 0
(EngineCore_6 pid=33041) INFO 11-24 15:31:05 [cuda_communicator.py:77] Using naive all2all manager.
(EngineCore_1 pid=33036) INFO 11-24 15:31:05 [parallel_state.py:1083] rank 1 in world size 8 is assigned as DP rank 1, PP rank 0, TP rank 0, EP rank 1
(EngineCore_2 pid=33037) INFO 11-24 15:31:05 [parallel_state.py:1083] rank 2 in world size 8 is assigned as DP rank 2, PP rank 0, TP rank 0, EP rank 2
(EngineCore_5 pid=33040) INFO 11-24 15:31:05 [cuda_communicator.py:77] Using naive all2all manager.
(EngineCore_4 pid=33039) INFO 11-24 15:31:05 [cuda_communicator.py:77] Using naive all2all manager.
(EngineCore_3 pid=33038) INFO 11-24 15:31:05 [cuda_communicator.py:77] Using naive all2all manager.
(EngineCore_7 pid=33042) INFO 11-24 15:31:05 [cuda_communicator.py:77] Using naive all2all manager.
(EngineCore_6 pid=33041) INFO 11-24 15:31:05 [parallel_state.py:1083] rank 6 in world size 8 is assigned as DP rank 6, PP rank 0, TP rank 0, EP rank 6
(EngineCore_5 pid=33040) INFO 11-24 15:31:05 [parallel_state.py:1083] rank 5 in world size 8 is assigned as DP rank 5, PP rank 0, TP rank 0, EP rank 5
(EngineCore_4 pid=33039) INFO 11-24 15:31:05 [parallel_state.py:1083] rank 4 in world size 8 is assigned as DP rank 4, PP rank 0, TP rank 0, EP rank 4
(EngineCore_3 pid=33038) INFO 11-24 15:31:05 [parallel_state.py:1083] rank 3 in world size 8 is assigned as DP rank 3, PP rank 0, TP rank 0, EP rank 3
(EngineCore_7 pid=33042) INFO 11-24 15:31:05 [parallel_state.py:1083] rank 7 in world size 8 is assigned as DP rank 7, PP rank 0, TP rank 0, EP rank 7

RunLLM · November 24, 2025, 7:35am

This log means that in your vLLM distributed setup with 8 processes (world size 8), each process is assigned a unique Data Parallel (DP) rank from 0 to 7, while all have Pipeline Parallel (PP) rank 0, Tensor Parallel (TP) rank 0, and Expert Parallel (EP) ranks matching their DP ranks. This indicates pure data parallelism: each process independently holds a full model replica, and there is no tensor or pipeline parallel splitting of the model across GPUs. The “naive all2all manager” is used for communication, which is the default for this configuration.

This assignment is expected if you set --data-parallel-size=8 and left tensor/pipeline parallel sizes at 1. For more details on how these ranks are assigned and what they mean, see the explanations in parallel_state.py and the API documentation. Would you like a step-by-step breakdown of how these groups are constructed?

Sources:

Hint: Mention @RunLLM in the post for followups.

Topic		Replies	Views
为什么ep的 size=tp*dp General	8	96	November 25, 2025
What is the role of the additional process running on GPU 0 in DP+EP? General	3	13	February 25, 2026
[Question] What if TP size < GPU Rank size? General	12	150	November 25, 2025
Pp8并行，update_from_output 会等所有rank的 model_executor.execute_model 执行完了之后才会执行吗 General	84	262	January 8, 2026
Some question abound Data Parallel examples General	7	364	June 21, 2025

Dp 8启动，没有使能 --enable_expert_parallel，为什么还有 ep rank的打印

Related topics