Can vLLM return expert selection info to support routing replay?

For RL training of MoE model, such as Qwen3-MoE, routing replay is proven to be a useful trick to stablise the training.

However, it requires the infer-Engine to return the router selection at each layer. I don’t know if vLLM supports this or not?

vLLM does not natively support returning router selections (router logits or expert assignments) for each layer during inference. Users seeking this functionality have modified the model code to save router logits or expert assignments to a file, as shown in external projects like SteerMoE. There is no built-in API or flag in vLLM for extracting router selections directly during generation for MoE models such as Qwen3-MoE; custom code changes are required to access and store this information during inference according to this discussion.

Would you like more detail on how to implement such a modification?

Sources:

Hint: Mention @RunLLM in the post for followups.