Data parallel and kv cache with multiple GPUs

youlearnit · September 22, 2025, 2:53pm

Hi,

How is the KV cache used in multipe GPU setup when using Data parallel? is it same as in Tensor parallel that both GPUS available vram after model weights is available for KV cache?
If I run 2 separate 32GB 5090 behind load balancer, the KV cache is like doubled, both GPUs has their own KV cache? With tp=2 all the available VRAM is for one copy of KV cache?

RunLLM · September 22, 2025, 2:53pm

In a data parallel (DP) setup, each GPU runs a full replica of the model and maintains its own independent KV cache. This means if you run two separate 32GB GPUs behind a load balancer (DP=2), each GPU has its own KV cache, and the total KV cache capacity is effectively doubled, but each request only uses the KV cache on the GPU it lands on—there is no sharing between GPUs. In contrast, with tensor parallelism (TP=2), the model weights are split across both GPUs, and the available VRAM on both is pooled for a single, sharded KV cache, so each request can use the combined memory for longer contexts or more concurrent requests. Thus, in TP, the KV cache is distributed and shared, while in DP, it is duplicated and isolated per GPU instance.

To summarize: DP gives each GPU its own KV cache (no sharing, but total capacity is doubled for independent requests), while TP shards both model weights and KV cache across GPUs (shared, larger single KV cache per request). For more details, see vLLM Data Parallel Deployment and Optimization and Tuning.

Would you like a step-by-step breakdown or example launch commands?

Sources:

huxiaoxiang · September 23, 2025, 1:42am

Thanks for the explanation which helps me understanding this point. @RunLLM

Topic		Replies	Views
Data parallel or tensor parallel General	3	264	September 14, 2025
Run vLLM on two diffrent GPU General	1	699	May 21, 2025
How to do KV cache transfer between a CPU instance and a GPU instance? KV-Cache	1	201	July 31, 2025
Some question abound Data Parallel examples General	7	342	June 21, 2025
Why allocating KV-caches on a per-layer basis General	5	188	August 15, 2025

Data parallel and kv cache with multiple GPUs

Related topics