I ran a vllm server with model=Qwen3.5-397B on 0.20.1 vllm, and this is my server command:
vllm serve $model --trust-remote-code -tp 8 --port 8020 --kv-transfer-config ‘{“kv_connector”:“MooncakeConnector”,“kv_role”:“kv_producer”}’ --speculative-config.method mtp --speculative-config.num_speculative_tokens 2 --tool-call-parser qwen3_coder --reasoning-parser qwen3 --enable-auto-tool-choice --no-disable-hybrid-kv-cache-manager
But A NotImplementedError occured:
File "/usr/local/lib/python3.12/site-packages/vllm/distributed/kv_transfer/kv_connector/utils.py", line 450, in __post_init__
kv_cache_shape: tuple[int, ...] = attn_backend.get_kv_cache_shape(
File "/usr/local/lib/python3.12/site-packages/vllm/v1/attention/backend.py", line 96, in get_kv_cache_shape
raise NotImplementedError