Why using devel image to build final vLLM docker image?

nktduong · August 17, 2025, 2:49pm

Hi,

I have been trying to reduce the docker image size built from the Dockerfile on github. From what I see, it seems that the final image size is big because of using devel image.

My question is why don’t we use the runtime image instead of devel image to reduce the image size? Would it affect the performance or anything else?

Relevant discussion:

→ I have tried this method but it only reduces the image size by about 100MB

RunLLM · August 17, 2025, 2:50pm

Using the CUDA “devel” image is necessary for building vLLM from source because it includes development tools like nvcc (the CUDA compiler), which are not present in the “runtime” image. The “runtime” image is smaller but lacks the build tools required for compiling vLLM and its dependencies; using it will break the build process, not just reduce the image size. Once built, you could theoretically copy only the runtime artifacts into a new image based on the “runtime” image (multi-stage build), but this is not the default approach in vLLM’s official Dockerfiles. Performance at inference time is not affected by the base image, as long as all required runtime libraries are present—it’s a matter of build compatibility, not runtime speed or efficiency. According to this discussion and this issue, the main reason for the large image size is the inclusion of CUDA, PyTorch, and multi-arch support, not just the choice of base image.

Would you like more detail on how to use multi-stage builds to further reduce the image size?

Sources:

robertgshaw2-redhat · August 17, 2025, 8:50pm

Various dependencies of vLLM (e.g. torch.compile, Triton, and FlashInfer) JIT compile the actual device code that runs. This allows vLLM to optimize for the specific model and hardware profile without having a gigantic binary. Unfortunately this means that we need to have various “devel” dependencies like nvcc and gcc available

That being said, you are right that we could have a slimmer dockerfile. There are many competing priorities for the team, so we would welcome any help to resolve!

RunLLM · August 17, 2025, 8:50pm

You are correct: vLLM and its dependencies (like torch.compile, Triton, and FlashInfer) require JIT compilation, which means development tools such as nvcc and gcc must be present at runtime for kernel compilation and auto-tuning. This is why the “devel” image is used, as confirmed in this discussion and this GitHub issue. While a slimmer Dockerfile is possible, removing these tools would break JIT compilation unless all kernels are precompiled for every possible hardware configuration, which is not practical.

Would you like more detail on multi-stage builds or community efforts to slim down the image?

Sources:

Topic		Replies	Views
Current vLLM docker image size is 12.64Gb, how to reduce it? General	3	66	July 28, 2025
How to get the dev version vllm docker image? General	2	44	April 8, 2025
Build VLLM without installing gcc? General	1	192	June 20, 2025
Docker image `vllm/vllm-openai:v0.9.0` doesn't work on 5090 General	3	430	June 10, 2025
Make install easier General	11	108	July 24, 2025

Why using devel image to build final vLLM docker image?

Related topics