Why V1 not support Mamba Models?

qscqesze · April 21, 2025, 3:17am

Hi team,
While reading the documentation, I noticed that V1 currently does not support Mamba models due to the use of non-standard attention mechanisms, such as selective state-space mechanisms. I’m curious — what would be the main technical challenges or limitations in adapting Mamba models (e.g., MambaForCausalLM, JambaForCausalLM) to be compatible with V1?

Here is the excerpt from the documentation:

Mamba Models
Models using selective state-space mechanisms (instead of standard transformer attention) are not yet supported (e.g., MambaForCausalLM, JambaForCausalLM).

Thanks in advance for your insights!

RunLLM · April 21, 2025, 3:17am

It seems likely that the main technical challenges in adapting Mamba models to be compatible with vLLM V1 stem from their use of selective state-space mechanisms, which differ from the standard transformer attention models that vLLM V1 is optimized for. These models, such as MambaForCausalLM and JambaForCausalLM, require specialized handling of their internal states and cache management, which is not yet fully supported in V1. The current architecture of vLLM V1 is designed around decoder-only transformers, and models with non-standard attention mechanisms may require significant architectural changes to accommodate their unique requirements.

My understanding is that the vLLM team is working on optimizing and supporting a broader range of models, including those with hybrid or non-standard attention mechanisms, in future updates. This involves developing new memory management strategies and cache handling techniques that can efficiently support the unique characteristics of these models. For more details, you might want to check the vLLM documentation and related GitHub issues discussing the ongoing development efforts.

Would you like more detail on any specific aspect?

Sources:

Topic		Replies	Views
V1 has lower end-to-end performance than V0(--num-scheduler-steps=8) V1 Feedback	6	140	June 12, 2025
Installation issue General	8	66	July 3, 2025
V1 has lower end-to-end performance than V0(--num-scheduler-steps=8) General	1	20	June 11, 2025
Support for (sparse) key value caching KV-Cache	16	100	May 3, 2025
Trouble Running vLLM `0.9.1` with Latest Transformers (Gemma3nConfig Errors) General	1	125	June 30, 2025

Why V1 not support Mamba Models?

Related topics