Dose vllm support Qwen3.5 pd disaggregation with Mooncake?

I ran a vllm server with model=Qwen3.5-397B on 0.20.1 vllm, and this is my server command:

vllm serve $model --trust-remote-code -tp 8 --port 8020 --kv-transfer-config ‘{“kv_connector”:“MooncakeConnector”,“kv_role”:“kv_producer”}’ --speculative-config.method mtp --speculative-config.num_speculative_tokens 2 --tool-call-parser qwen3_coder --reasoning-parser qwen3 --enable-auto-tool-choice --no-disable-hybrid-kv-cache-manager

But A NotImplementedError occured:

File "/usr/local/lib/python3.12/site-packages/vllm/distributed/kv_transfer/kv_connector/utils.py", line 450, in __post_init__
  kv_cache_shape: tuple[int, ...] = attn_backend.get_kv_cache_shape(

File "/usr/local/lib/python3.12/site-packages/vllm/v1/attention/backend.py", line 96, in get_kv_cache_shape
  raise NotImplementedError

Your error occurs because the attention backend for your model does not implement the get_kv_cache_shape method, which is required for KV cache transfer with MooncakeConnector in vLLM v0.20.1. This is a known limitation—certain attention backends (especially for very new or large models) do not support all distributed KV cache features, resulting in NotImplementedError at this code path. There is no workaround unless the backend adds this method or you switch to a supported connector or model/attention backend combination. See discussion in related issues for context: mooncakeconnector assertion error.

Would you like more detail on possible alternatives or debugging steps?

Sources:

Hint: Mention @RunLLM in the post for followups.