HI,
I have in proxmox VM 2 GPUs passed trough.
I use them separately in different docker containers.
The problem is that they do not work simultaneously.
When starting both containers, there comes some error and open webui cant find either of them.
When stopping another and starting the other, it always works. I pass –gpus device=1 to other and –gpus device=0 to other. The other has –network host -p 8000:8000 and other –network host -p 8001:8000 and different model names etc. What is the problem here?
This issue is likely due to how Docker, GPU drivers, and vLLM interact with GPU resource allocation and device visibility. When running multiple containers, each with a different GPU assigned via --gpus device=0
and --gpus device=1
, vLLM should be able to use each GPU independently. However, problems can arise if the containers are not properly isolated, or if there are conflicts in shared resources like shared memory (/dev/shm
), or if the GPU drivers/NCCL libraries are not compatible with multi-container, multi-GPU setups. Also, using --network host
in both containers can sometimes cause port or network conflicts.
According to vLLM issue #217, Docker port mapping and GPU assignment must be correct, and you should ensure that each container exposes the correct port and only the intended GPU. Additionally, vLLM issue #10643 and vLLM issue #2248 highlight that vLLM’s GPU memory accounting can cause issues when running multiple instances on the same GPU, but in your case, each container is assigned a separate GPU, so this should not be the direct cause unless there is a driver or NCCL version conflict. Make sure your Docker run commands include --ipc=host
or a sufficiently large --shm-size
to avoid shared memory issues, and verify that your NVIDIA drivers and NCCL versions are up to date and compatible with your vLLM and CUDA versions.
Would you like a step-by-step troubleshooting checklist or example Docker commands for this setup?
Sources:
Would it work if the dockers are run in separate VMs?
What is shm?
Yes, running each Docker container in a separate VM with its own GPU passthrough should work, as each VM will have exclusive access to its assigned GPU and isolated resources, reducing conflicts seen when sharing GPUs or system resources between containers in the same VM. This setup is commonly used for multi-GPU workloads in virtualized environments, as confirmed in vLLM issue #10643.
shm
refers to shared memory, specifically the /dev/shm
directory in Linux, which is used for fast inter-process communication. vLLM and PyTorch use shared memory for efficient data exchange between processes, especially for tensor parallelism. If the shared memory size is too small, you may encounter errors or degraded performance; you can increase it in Docker with --shm-size
or use --ipc=host
(vLLM Docker docs).
Would you like more detail on configuring shared memory or VM GPU passthrough?
Sources: