The determine_expert_map function in vllm/model_executor/layers/fused_moe/layer.py calculates how many experts are assigned to each rank for expert parallelism (EP) and creates a mapping from global to local expert indices. Experts are distributed as evenly as possible across ranks, with any remainder assigned to the first few ranks. If ep_size is 1, all experts are local and no mapping is needed; otherwise, a tensor maps global expert indices to local ones, with -1 for experts not assigned to the current rank. The function returns a tuple: (local_num_experts, expert_map).
For more details, see the function docstring and implementation in vllm/model_executor/layers/fused_moe/layer.py and the API documentation at vllm.model_executor.layers.fused_moe.layer.determine_expert_map.
Would you like a step-by-step breakdown or code example?
Sources:
Hint: Mention @RunLLM in the post for followups.