Can vLLM return expert selection info to support routing replay?

ZiyiTsang · October 16, 2025, 5:41am

For RL training of MoE model, such as Qwen3-MoE, routing replay is proven to be a useful trick to stablise the training.

However, it requires the infer-Engine to return the router selection at each layer. I don’t know if vLLM supports this or not?

RunLLM · October 16, 2025, 5:41am

vLLM does not natively support returning router selections (router logits or expert assignments) for each layer during inference. Users seeking this functionality have modified the model code to save router logits or expert assignments to a file, as shown in external projects like SteerMoE. There is no built-in API or flag in vLLM for extracting router selections directly during generation for MoE models such as Qwen3-MoE; custom code changes are required to access and store this information during inference according to this discussion.

Would you like more detail on how to implement such a modification?

Sources:

https://github.com/vllm-project/vllm/issues/19342

Hint: Mention @RunLLM in the post for followups.

Topic		Replies	Views
Expert offloading Features	1	333	November 11, 2025
Do the current MoE models support setting LoRA adapters on expert layers? General	2	391	October 15, 2025
How to output selected expert IDs of prefilling? General	2	34	December 25, 2025
vLLM Load balancing General	1	886	March 24, 2025
Enable Expert Offloading General	3	763	November 11, 2025

Can vLLM return expert selection info to support routing replay?

Related topics