How to output selected expert IDs of prefilling?

I create a bucket to store topk_ids in `select_experts()` function under `vllm/model_executor/layers/fused_moe/layer.py`. But it seems it only captures expert ids of decoding phase. How to output expert ids in prefilling phase?

In vLLM, the select_experts() function in vllm/model_executor/layers/fused_moe/layer.py is called during both prefilling and decoding phases, so topk_ids should be available in both. If your bucket only captures expert ids during decoding, it’s likely due to where or how you are invoking or storing the results, not a limitation of select_experts() itself. Both phases route tokens through this function, as shown in the source code.

To capture expert ids during prefilling, ensure your bucket logic is active and correctly hooked during the prefill phase as well. The prefilling and decoding phases are separated at the engine/request level, not at the MoE layer, so any logic inside select_experts() will run for both if properly instrumented. For more on the prefill/decode split, see the disaggregated prefilling design.

Would you like a step-by-step example of how to instrument this for both phases?

Sources:

Hint: Mention @RunLLM in the post for followups.

When using enforce_eager, it can output expert ids. Could you analyze the reason?