How to output selected expert IDs of prefilling?

guo-yanpei · December 25, 2025, 8:27am

I create a bucket to store topk_ids in `select_experts()` function under `vllm/model_executor/layers/fused_moe/layer.py`. But it seems it only captures expert ids of decoding phase. How to output expert ids in prefilling phase?

RunLLM · December 25, 2025, 8:28am

In vLLM, the select_experts() function in vllm/model_executor/layers/fused_moe/layer.py is called during both prefilling and decoding phases, so topk_ids should be available in both. If your bucket only captures expert ids during decoding, it’s likely due to where or how you are invoking or storing the results, not a limitation of select_experts() itself. Both phases route tokens through this function, as shown in the source code.

To capture expert ids during prefilling, ensure your bucket logic is active and correctly hooked during the prefill phase as well. The prefilling and decoding phases are separated at the engine/request level, not at the MoE layer, so any logic inside select_experts() will run for both if properly instrumented. For more on the prefill/decode split, see the disaggregated prefilling design.

Would you like a step-by-step example of how to instrument this for both phases?

Sources:

Hint: Mention @RunLLM in the post for followups.

guo-yanpei · December 25, 2025, 8:41am

When using enforce_eager, it can output expert ids. Could you analyze the reason?

Topic		Replies	Views
Why not add a self prefix ？ General	1	34	October 15, 2025
Expert offloading Features	1	636	November 11, 2025
Can vLLM return expert selection info to support routing replay? General	1	442	October 16, 2025
Enable Expert Offloading General	3	2181	November 11, 2025
Do the current MoE models support setting LoRA adapters on expert layers? General	2	598	October 15, 2025

How to output selected expert IDs of prefilling?

Related topics