You are correct: vLLM and its dependencies (like torch.compile, Triton, and FlashInfer) require JIT compilation, which means development tools such as nvcc
and gcc
must be present at runtime for kernel compilation and auto-tuning. This is why the “devel” image is used, as confirmed in this discussion and this GitHub issue. While a slimmer Dockerfile is possible, removing these tools would break JIT compilation unless all kernels are precompiled for every possible hardware configuration, which is not practical.
Would you like more detail on multi-stage builds or community efforts to slim down the image?
Sources: