Is KV Transfer Config supported in V1 for running disaggregated benchmarks?

PratishthaGaur · July 7, 2025, 4:39pm

I am trying to run the disagg_benchmark in vllm. But I get this error. Is KV Transfer Config supported in V1 for running disaggregated benchmarks?

Command
python3
-m vllm.entrypoints.openai.api_server
–model $model
–port 8100
–max-model-len 10000
–gpu-memory-utilization 0.6 --trust-remote-code
–kv-transfer-config ‘{“kv_connector”:“SharedStorageConnector”,“kv_role”:“kv_both”,“kv_connector_extra_config”:{“shared_storage_path”: “local_storage”}}’ &

Error:
NotImplementedError: VLLM_USE_V1=1 is not supported with --kv-transfer-config

RunLLM · July 7, 2025, 4:40pm

No, --kv-transfer-config is not supported in vLLM V1. When you run with V1 enabled (VLLM_USE_V1=1), attempting to use --kv-transfer-config will result in a NotImplementedError, and vLLM will either fall back to V0 or fail to start, depending on the version and context. This is a known limitation, and disaggregated KV cache transfer (including for benchmarks) is only supported in V0 as of now. See the explicit warning and user reports in this issue and this usage discussion.

If you need to run disaggregated benchmarks, set VLLM_USE_V1=0 to force V0, or use a V0-compatible connector and configuration. Would you like more detail on how to set up disaggregated prefill/decoding in V0?

Sources:

Topic		Replies	Views
Support for (sparse) key value caching KV-Cache	16	236	May 3, 2025
How to do KV cache transfer between a CPU instance and a GPU instance? KV-Cache	1	105	July 31, 2025
Disagg Prefill timeout Disaggregated Prefilling	1	54	July 7, 2025
In single node deployment environment, how can we make vllm call unified_attention more often to trigger KVCache connector workload General	12	83	June 24, 2025
Error when running disaggregated prefill General	5	84	July 10, 2025

Is KV Transfer Config supported in V1 for running disaggregated benchmarks?

Related topics