2 vllm docker on same host

youlearnit · September 16, 2025, 9:47pm

HI,
I have in proxmox VM 2 GPUs passed trough.
I use them separately in different docker containers.
The problem is that they do not work simultaneously.
When starting both containers, there comes some error and open webui cant find either of them.
When stopping another and starting the other, it always works. I pass –gpus device=1 to other and –gpus device=0 to other. The other has –network host -p 8000:8000 and other –network host -p 8001:8000 and different model names etc. What is the problem here?

RunLLM · September 16, 2025, 9:47pm

This issue is likely due to how Docker, GPU drivers, and vLLM interact with GPU resource allocation and device visibility. When running multiple containers, each with a different GPU assigned via --gpus device=0 and --gpus device=1, vLLM should be able to use each GPU independently. However, problems can arise if the containers are not properly isolated, or if there are conflicts in shared resources like shared memory (/dev/shm), or if the GPU drivers/NCCL libraries are not compatible with multi-container, multi-GPU setups. Also, using --network host in both containers can sometimes cause port or network conflicts.

According to vLLM issue #217, Docker port mapping and GPU assignment must be correct, and you should ensure that each container exposes the correct port and only the intended GPU. Additionally, vLLM issue #10643 and vLLM issue #2248 highlight that vLLM’s GPU memory accounting can cause issues when running multiple instances on the same GPU, but in your case, each container is assigned a separate GPU, so this should not be the direct cause unless there is a driver or NCCL version conflict. Make sure your Docker run commands include --ipc=host or a sufficiently large --shm-size to avoid shared memory issues, and verify that your NVIDIA drivers and NCCL versions are up to date and compatible with your vLLM and CUDA versions.

Would you like a step-by-step troubleshooting checklist or example Docker commands for this setup?

Sources:

youlearnit · September 16, 2025, 9:50pm

Would it work if the dockers are run in separate VMs?
What is shm?

RunLLM · September 16, 2025, 9:51pm

Yes, running each Docker container in a separate VM with its own GPU passthrough should work, as each VM will have exclusive access to its assigned GPU and isolated resources, reducing conflicts seen when sharing GPUs or system resources between containers in the same VM. This setup is commonly used for multi-GPU workloads in virtualized environments, as confirmed in vLLM issue #10643.

shm refers to shared memory, specifically the /dev/shm directory in Linux, which is used for fast inter-process communication. vLLM and PyTorch use shared memory for efficient data exchange between processes, especially for tensor parallelism. If the shared memory size is too small, you may encounter errors or degraded performance; you can increase it in Docker with --shm-size or use --ipc=host (vLLM Docker docs).

Would you like more detail on configuring shared memory or VM GPU passthrough?

Sources:

Topic		Replies	Views
2 vllm containers on a single GPU General	3	701	August 27, 2025
Does vllm support deploy multiple docker instance on one gpu General	1	419	May 21, 2025
Benchmarking vllm performance - Having trouble sending requests to vllm servers General	1	32	August 18, 2025
Docker explosion this morning after it worked fine for a long while NVIDIA GPU Support	6	306	May 6, 2025
Run vLLM on two diffrent GPU General	1	355	May 21, 2025

2 vllm docker on same host

Related topics