The compilation of C++ files is very slow. How can I use multi-core builds? I see that it seems to only use one core.
=> [build 6/8] RUN --mount=type=cache,target=/root/.cache/ccache --mount=type=cache,target=/root/.cache/uv --mount=type=bind,source=.git,target=.git if [ "$USE_SCCACHE" != "1" ]; then 6609.0s
=> => # [53/508] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/w8a8/cutlass/c3x/scaled_mm_blockwise_sm90_fp8.cu.o
=> => # [54/508] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/w8a8/cutlass/scaled_mm_c2x.cu.o
=> => # [55/508] Building CUDA object CMakeFiles/_C.dir/csrc/sparse/cutlass/sparse_scaled_mm_c3x.cu.o
=> => # [56/508] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/w8a8/cutlass/moe/grouped_mm_c3x_sm90.cu.o
=> => # [57/508] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/w8a8/cutlass/moe/moe_data.cu.o
=> => # [58/508] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/machete/generated/machete_mm_dispatch.cu.o
To enable multi-core builds and speed up C++/CUDA compilation in vLLM, set the environment variable MAX_JOBS to the number of parallel jobs you want (typically your CPU core count or higher). For CUDA builds, also set NVCC_THREADS (usually 2). For example: export MAX_JOBS=16; export NVCC_THREADS=2 before building, or use Docker build args --build-arg max_jobs=16 --build-arg nvcc_threads=2. This allows the build system to use multiple CPU cores for parallel compilation, significantly reducing build time. Be aware that high values can increase memory usage and may cause OOM on machines with limited RAM. The effective number of jobs is MAX_JOBS / NVCC_THREADS for CUDA compilation, so adjust accordingly for best performance (source, source, source).
If you are using CMake directly, you can also adjust parallelism via the jobs field in your CMakeUserPresets.json or by passing -j to the build command. For repeated builds, using ccache or sccache can further speed up rebuilds by caching compilation results (source, source). Would you like a step-by-step example or Dockerfile snippet?