Whats is Mamba and Hybrid model？

hyena126 · March 17, 2026, 3:03pm

(APIServer pid=22031) WARNING 03-17 22:16:08 [config.py:381] Mamba cache mode is set to ‘align’ for Qwen3_5ForConditionalGeneration by default when prefix caching is enabled
(APIServer pid=22031) INFO 03-17 22:16:08 [config.py:401] Warning: Prefix caching in Mamba cache ‘align’ mode is currently enabled. Its support for Mamba layers is experimental. Please report any issues you may observe. in here，Whats is Mamba？

RunLLM · March 17, 2026, 3:04pm

Mamba is a type of neural network layer used in some large language models (LLMs) as an alternative or complement to traditional attention mechanisms. Specifically, Mamba layers implement state space models (SSMs) that can efficiently model long-range dependencies in sequences, and are often used in “hybrid” architectures alongside attention layers. In vLLM, Mamba layers require special handling for caching and prefix reuse, which is why you see experimental support warnings when enabling prefix caching with Mamba-based models like Qwen3.5—for these, the “align” cache mode is used by default when prefix caching is enabled, but support is still under development and may not be fully stable or optimized yet. See MambaBase documentation and Qwen3.5 configuration tips for more details.

Would you like a deeper technical explanation of how Mamba layers work or how they interact with caching in vLLM?

Sources:

Hint: Mention @RunLLM in the post for followups.

hyena126 · March 17, 2026, 3:10pm

is GLM5 include mamba layer？

Topic		Replies	Views
Why V1 not support Mamba Models? V1 Feedback	1	163	April 21, 2025
Whats is the hybrid model General	1	49	March 17, 2026
Why allocating KV-caches on a per-layer basis General	5	348	August 15, 2025
Customized Model Generation Verification General	1	87	July 14, 2025
Gemma 3 prefix caching in case of multimodal prompts Model Support	4	297	May 22, 2025

Whats is Mamba and Hybrid model？

Related topics