Does vllm support deploy multiple docker instance on one gpu

781574155 · May 21, 2025, 9:40am

I have an Nvidia H100, I want to deploy multiple small model on this GPU.
I deploy an qwen2.5-omni-7b with gpu_memory_utilization = 0.5.
I try to deploy another qwen3-4b with gpu_memory_utilization = 0.4. Error occures:

ValueError: No available memory for the cache blocks. Try increasing gpu_memory_utilization when initializing the engine.

If I stop qwen2.5-omni-7b and only deploy qwen3-4b, gpu_memory_utilization = 0.25 is enough

RunLLM · May 21, 2025, 9:41am

This error occurs because vLLM’s gpu_memory_utilization parameter is a per-instance limit, but the memory calculation does not account for memory already used by other vLLM instances on the same GPU. As a result, if you start one model and then another, the second instance may see less available memory than expected, leading to the “No available memory for the cache blocks” error—even if the sum of gpu_memory_utilization values is less than 1.0. This is a known issue, and the workaround is to stagger the startup of the services (wait 30 seconds between launches) or increase the gpu_memory_utilization for the second instance to account for memory already used by the first instance. See the discussion and workaround in this issue.

If you stop the first model, the second can start with a lower gpu_memory_utilization because all GPU memory is available. When running multiple vLLM instances, you may need to set a higher gpu_memory_utilization for the later-started instance, or experiment with staggered startups, as described above. Would you like more detail on how to tune these parameters or on the underlying memory allocation behavior?

Sources:

Feature: Support multiple models per GPU

Topic		Replies	Views
2 vllm containers on a single GPU General	3	2039	August 27, 2025
An issue about using multiple GPU to deploy multiple models with vllm General	1	99	May 28, 2026
Multiple vLLM Engine Deployment problem on NVIDIA's SM Engines General	1	324	October 2, 2025
How to serve two vLLM instance using docker? General	3	663	February 26, 2026
Run vLLM on two diffrent GPU General	1	1092	May 21, 2025

Does vllm support deploy multiple docker instance on one gpu

Related topics