Do the current MoE models support setting LoRA adapters on expert layers?

hjh · October 15, 2025, 5:59am

As the title suggests, do the current MoE models support setting LoRA adapters on expert layers? It seems not supported, but I couldn’t find a clear statement in the documentation.

RunLLM · October 15, 2025, 5:59am

No, current MoE models in vLLM do not support setting LoRA adapters on expert layers. The main blocker is that the FusedMoE implementation does not support LoRA, and this limitation is confirmed for models like Qwen 3 MoE, DeepSeek V2, and Llama4 MoE. There are plans to add support, but it has not been implemented yet, and documentation references suggesting support are incorrect or outdated. Only the language backbone (non-expert layers) may support LoRA in some cases, but not the expert layers themselves. Attempts to load LoRA adapters for expert layers will not work and may result in errors or warnings during model loading or inference, as discussed in multiple issues and pull requests by the vLLM team and users. For now, LoRA adapters should not be used with MoE expert layers in vLLM, and contributions to add this feature are welcome but not yet available in the main codebase or releases. See the discussion for more details and workarounds, such as only applying LoRA to attention layers, not experts.

Would you like a step-by-step breakdown or more technical details on the current limitations and possible workarounds?

Sources:

Hint: Mention @RunLLM in the post for followups.

jeejeelee · October 15, 2025, 6:15am

WIP: [Feature][Kernel]FusedMoE LoRA by CNTRYROA · Pull Request #21229 · vllm-project/vllm · GitHub

Topic		Replies	Views
Expert offloading Features	1	628	November 11, 2025
LoRA integration for Qwen3.5-122b fails during deployment on vLLM 0.17.0 General	3	241	March 18, 2026
Enable Expert Offloading General	3	2111	November 11, 2025
Can vLLM return expert selection info to support routing replay? General	1	438	October 16, 2025
Why not add a self prefix ？ General	1	30	October 15, 2025

Do the current MoE models support setting LoRA adapters on expert layers?

Related topics