As the title suggests, do the current MoE models support setting LoRA adapters on expert layers? It seems not supported, but I couldn’t find a clear statement in the documentation.
No, current MoE models in vLLM do not support setting LoRA adapters on expert layers. The main blocker is that the FusedMoE implementation does not support LoRA, and this limitation is confirmed for models like Qwen 3 MoE, DeepSeek V2, and Llama4 MoE. There are plans to add support, but it has not been implemented yet, and documentation references suggesting support are incorrect or outdated. Only the language backbone (non-expert layers) may support LoRA in some cases, but not the expert layers themselves. Attempts to load LoRA adapters for expert layers will not work and may result in errors or warnings during model loading or inference, as discussed in multiple issues and pull requests by the vLLM team and users. For now, LoRA adapters should not be used with MoE expert layers in vLLM, and contributions to add this feature are welcome but not yet available in the main codebase or releases. See the discussion for more details and workarounds, such as only applying LoRA to attention layers, not experts.
Would you like a step-by-step breakdown or more technical details on the current limitations and possible workarounds?
Sources:
- Feature: Qwen 3 MoE Lora adapter support
- Feature: LoRA fine-tuning model for DeepSeek V2
- Feature: Use LoRA on MoE models
- LoRA support on llama4
Hint: Mention @RunLLM in the post for followups.