Current vLLM docker image size is 12.64Gb, how to reduce it?

andresC · July 28, 2025, 2:50pm

Current vLLM-openai docker image size is 12.64Gb, which is quite big.

I am wondering if there is any way to reduce the image size, e.g. for specific environments (targeting a specific GPU such as Nvidia A100 or architecture), or removing dependencies not needed for production.

RunLLM · July 28, 2025, 2:51pm

The vLLM Docker image is large mainly due to bundled dependencies like PyTorch, CUDA libraries, and support for multiple GPU architectures. To reduce image size, you can build a custom image targeting only your specific GPU architecture (e.g., A100) by setting the --build-arg torch_cuda_arch_list="8.0" during the build, and pruning optional or development dependencies. You can also start from the official image and add only the production dependencies you need, or use a multi-stage build to separate build-time and runtime layers. See the official docs for custom Docker builds and pruning tips and example Dockerfiles.

Would you like a step-by-step example Dockerfile for a minimal, A100-only production image?

Sources:

andresC · July 28, 2025, 2:54pm

yes, a minimal A100 only production image would be ideal. Do you have an estimate on how much would that reduce image size?

RunLLM · July 28, 2025, 2:54pm

It seems likely that targeting only the A100 (compute capability 8.0) and removing optional/dev dependencies can reduce the vLLM Docker image size by several GB—potentially down to 6–8GB, though exact savings depend on the base image and which dependencies you exclude. Most of the size comes from CUDA, PyTorch, and multi-arch support, so limiting to a single arch helps significantly, but getting below 5GB is considered unrealistic due to the size of core libraries (discussion, docs).

Here’s a minimal example Dockerfile for A100-only production use:

FROM nvidia/cuda:12.1.0-runtime-ubuntu22.04

# Install Python and pip
RUN apt-get update && apt-get install -y python3 python3-pip && \
    apt-get clean && rm -rf /var/lib/apt/lists/*

# Install only vLLM and production dependencies
RUN pip3 install --no-cache-dir vllm

# Set CUDA arch for A100 (8.0)
ENV TORCH_CUDA_ARCH_LIST="8.0"

# Expose API port
EXPOSE 8000

ENTRYPOINT ["python3", "-m", "vllm.entrypoints.openai.api_server"]

For even more reduction, you can build vLLM from source with --build-arg torch_cuda_arch_list="8.0" (docs). Would you like a more advanced multi-stage build example?

Sources:

Topic		Replies	Views
Why using devel image to build final vLLM docker image? General	3	45	August 17, 2025
vLLM docker image For MAC CPU Silicon General	21	326	September 19, 2025
Make install easier General	11	191	July 24, 2025
How to get the dev version vllm docker image? General	2	48	April 8, 2025
Use editable vllm for different GPU types General	1	22	August 31, 2025

Current vLLM docker image size is 12.64Gb, how to reduce it?

Related topics