Dp 8启动,没有使能 --enable_expert_parallel,为什么还有 ep rank的打印

[parallel_state.py:1083] rank 0 in world size 8 is assigned as DP rank 0, PP rank 0, TP rank 0, EP rank 0
(EngineCore_6 pid=33041) INFO 11-24 15:31:05 [cuda_communicator.py:77] Using naive all2all manager.
(EngineCore_1 pid=33036) INFO 11-24 15:31:05 [parallel_state.py:1083] rank 1 in world size 8 is assigned as DP rank 1, PP rank 0, TP rank 0, EP rank 1
(EngineCore_2 pid=33037) INFO 11-24 15:31:05 [parallel_state.py:1083] rank 2 in world size 8 is assigned as DP rank 2, PP rank 0, TP rank 0, EP rank 2
(EngineCore_5 pid=33040) INFO 11-24 15:31:05 [cuda_communicator.py:77] Using naive all2all manager.
(EngineCore_4 pid=33039) INFO 11-24 15:31:05 [cuda_communicator.py:77] Using naive all2all manager.
(EngineCore_3 pid=33038) INFO 11-24 15:31:05 [cuda_communicator.py:77] Using naive all2all manager.
(EngineCore_7 pid=33042) INFO 11-24 15:31:05 [cuda_communicator.py:77] Using naive all2all manager.
(EngineCore_6 pid=33041) INFO 11-24 15:31:05 [parallel_state.py:1083] rank 6 in world size 8 is assigned as DP rank 6, PP rank 0, TP rank 0, EP rank 6
(EngineCore_5 pid=33040) INFO 11-24 15:31:05 [parallel_state.py:1083] rank 5 in world size 8 is assigned as DP rank 5, PP rank 0, TP rank 0, EP rank 5
(EngineCore_4 pid=33039) INFO 11-24 15:31:05 [parallel_state.py:1083] rank 4 in world size 8 is assigned as DP rank 4, PP rank 0, TP rank 0, EP rank 4
(EngineCore_3 pid=33038) INFO 11-24 15:31:05 [parallel_state.py:1083] rank 3 in world size 8 is assigned as DP rank 3, PP rank 0, TP rank 0, EP rank 3
(EngineCore_7 pid=33042) INFO 11-24 15:31:05 [parallel_state.py:1083] rank 7 in world size 8 is assigned as DP rank 7, PP rank 0, TP rank 0, EP rank 7

This log means that in your vLLM distributed setup with 8 processes (world size 8), each process is assigned a unique Data Parallel (DP) rank from 0 to 7, while all have Pipeline Parallel (PP) rank 0, Tensor Parallel (TP) rank 0, and Expert Parallel (EP) ranks matching their DP ranks. This indicates pure data parallelism: each process independently holds a full model replica, and there is no tensor or pipeline parallel splitting of the model across GPUs. The “naive all2all manager” is used for communication, which is the default for this configuration.

This assignment is expected if you set --data-parallel-size=8 and left tensor/pipeline parallel sizes at 1. For more details on how these ranks are assigned and what they mean, see the explanations in parallel_state.py and the API documentation. Would you like a step-by-step breakdown of how these groups are constructed?

Sources:

Hint: Mention @RunLLM in the post for followups.