vLLM docker image For MAC CPU Silicon

hello everyone,

I want to run vLLM on my MACBookPro M2Pro.

By reading the Offical Document, It said I have to build vLLM from source codes to run it on Apple CPU M2Pro

While My MAC Operating System version is too low, can not build it from source codes. then I want to download a vLLM docker image for MAC CPU M2Pro, and run vLLM in docker container

But I did not found that image on DOCKER HUB

So, I seek some help here : where can i download vllm docker image for MAC M2pro

1 Like

There is currently no official vLLM Docker image for Mac (Apple Silicon) CPU on Docker Hub. The official vLLM images on Docker Hub (vllm/vllm-openai) are built for Linux/amd64 and require NVIDIA GPUs; they do not support Apple Silicon CPUs out of the box. For Apple Silicon (M1/M2/M3), the official documentation and maintainers state you must build vLLM from source on your Mac, as there are no pre-built wheels or images for this device (docs, github_file).

There are some unofficial, community-built images (e.g., on Docker Hub by individual users), but there is no formal, supported vLLM CPU image for Mac M2Pro. The only official CPU image is published to AWS ECR (public.ecr.aws/q9t5s3a7/vllm-cpu-release-repo:v0.7.2), but this is for Linux/amd64, not Mac ARM (github_issue). If you want to run vLLM on your Mac M2Pro, you must upgrade your OS to meet the build requirements and build from source, or attempt to build a Docker image yourself using the provided Dockerfiles and cross-compilation instructions (docs, github_file).

Would you like more detail on how to build vLLM from source or create a custom Docker image for Apple Silicon?

Sources:

I’m also facing the similar issue.
Today I’m planning to build a docker image and publish it on docker hub. Will share it here once I’m done. Hope that helps.

There is currently no official vLLM Docker image for Mac (Apple Silicon) CPUs on Docker Hub, and the maintainers recommend building from source for Apple Silicon (docs). If you successfully build and publish a working Docker image for Mac M2/M3 CPUs, it would be helpful to share the Dockerfile and image link with the community, as others have requested this but no official or widely shared solution exists (github_issue).

Would you like step-by-step guidance on building such a Docker image for Apple Silicon?

Sources:

please telnet me the image name after pushing it to Docker Hub, thank U!!!:heart_eyes:

1 Like

Sure thing @Pigcanflysohigh! I’ve been to build the image locally but it’s failing at my end as well. Will try again on a bigger machine this week.

Will let you know once I’m done.

1 Like

firest time I built it with command:docker build -f docker/Dockerfile.cpu --tag vllm-cpu-env .
failed,and the output is:ERROR: failed to solve: ResourceExhausted: process “/bin/sh -c VLLM_TARGET_DEVICE=cpu python3 setup.py bdist_wheel” did not complete successfully: cannot allocate memory

second time,I build it with:docker build -f docker/Dockerfile.cpu --tag vllm-cpu-env . --build-arg MAKEFLAGS=“-j1”
while I got the same output as the first time。

I seek the reason by gpt,that said it is because of the memory is not enough

Your error, ResourceExhausted: ... cannot allocate memory, means your system ran out of RAM during the vLLM Docker build. This is a common issue, especially on machines with limited memory, as vLLM’s build process is memory-intensive (github_issue, github_issue, docs).

To resolve this, try adding swap space or reducing parallelism by setting MAX_JOBS=1 (not MAKEFLAGS), e.g.:

export MAX_JOBS=1
docker build -f docker/Dockerfile.cpu --tag vllm-cpu-env .

Alternatively, add swap to your system or build on a machine with more RAM (github_issue). Would you like more detailed troubleshooting steps?

Sources:

thankU,

I have built the image sucessfully,about 2.54G

Great to hear you built the image successfully! A size of about 2.54G is typical for a vLLM CPU Docker image (github_pull_request). If you plan to share it, consider pushing it to Docker Hub for others who need Apple Silicon support.

Would you like tips on running vLLM in the container or optimizing the image further?

Sources:

I still have a question

-e VLLM_CPU_OMP_THREADS_BIND=

Apple Silicon ,is a unit include CPU and GPU, I am not sure if vLLM run base on CPU or GPU? AND what is the reason?

On Apple Silicon (M1/M2/M3), vLLM only supports running on the CPU, not the GPU. The environment variable VLLM_CPU_OMP_THREADS_BIND is used to control which CPU cores vLLM’s OpenMP threads are bound to; it does not affect GPU usage (docs, docs).

The reason is that vLLM does not support Apple’s Metal (MPS) GPU backend—only CPU inference is available on Mac. All computation is performed on the CPU, and the GPU is not used (github_issue, github_pull_request).

Would you like more detail on thread binding or vLLM’s device support?

Sources:

I hava got another trouble,

docker run --rm \ --privileged=true \ --shm-size=4g \ -p 8000:8000 \ -e VLLM_CPU_KVCACHE_SPACE=<KV cache space> \ -e VLLM_CPU_OMP_THREADS_BIND=<CPU cores for inference> \ vllm-cpu-env \ --model=meta-llama/Llama-3.2-1B-Instruct \ --dtype=bfloat16 \ other vLLM OpenAI server arguments

this is the method in offical doc how to run the container on my MAC, i am not sure how to config the two argumentsVLLM_CPU_OMP_THREADS_BIND=<CPU cores for inference> and VLLM_CPU_KVCACHE_SPACE=<KV cache space>.

can you give a example?

VLLM_CPU_OMP_THREADS_BIND shoud write every core or the number of cores?

VLLM_CPU_KVCACHE_SPACE what is the unit of this arg? G or M or K ?

For Apple Silicon (or any CPU backend), here’s how to set those variables:

  • VLLM_CPU_OMP_THREADS_BIND should be a list or range of CPU core IDs (not just a number). For example, to use cores 0–7:
    -e VLLM_CPU_OMP_THREADS_BIND=0-7
    This binds 8 OpenMP threads to logical CPU cores 0 through 7 (docs).

  • VLLM_CPU_KVCACHE_SPACE sets the KV cache size in GiB (not MB or KB). For example, to allocate 4 GiB:
    -e VLLM_CPU_KVCACHE_SPACE=4
    (docs)

Example full command:

docker run --rm \
  --privileged=true \
  --shm-size=4g \
  -p 8000:8000 \
  -e VLLM_CPU_KVCACHE_SPACE=4 \
  -e VLLM_CPU_OMP_THREADS_BIND=0-7 \
  vllm-cpu-env \
  --model=meta-llama/Llama-3.2-1B-Instruct \
  --dtype=bfloat16

Would you like more detail on how to choose the best values for your hardware?

Sources:

what about the follow Error logs

NotImplementedError: Could not run '_C::onednn_mm' with arguments from the 'CPU' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit ``https://fburl.com/ptmfixes`` for possible resolutions. '_C::onednn_mm' is only available for these backends: [Meta, BackendSelect, Python, FuncTorchDynamicLayerBackMode, Functionalize, Named, Conjugate, Negative, ZeroTensor, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, AutogradMPS, AutogradXPU, AutogradHPU, AutogradLazy, AutogradMTIA, AutogradMAIA, AutogradMeta, Tracer, AutocastCPU, AutocastMTIA, AutocastMAIA, AutocastXPU, AutocastMPS, AutocastCUDA, FuncTorchBatched, BatchedNestedTensor, FuncTorchVmapMode, Batched, VmapMode, FuncTorchGradWrapper, PythonTLSSnapshot, FuncTorchDynamicLayerFrontMode, PreDispatch, PythonDispatcher].

RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {'EngineCore_DP0': 1}

This error means the vLLM build is missing the required oneDNN (DNNL) backend for the _C::onednn_mm operator, so the operator cannot run on your CPU. This usually happens if oneDNN was not installed or not properly linked during the vLLM build process (github_issue, github_issue).

To fix:

  • Install oneDNN (DNNL) and its development headers before building vLLM, e.g.
    sudo apt-get install -y libdnnl-dev
  • Then rebuild vLLM from source so it can find and use oneDNN.

Would you like more detailed build instructions for your platform?

Sources:

first question: my System is MAC System, not Ubuntu. should i use apt-get to install dnn?

second question: I have built vLLM from source in my computer, if i go to install dnn, can i built vLLM again?

the third question: how to install dnn in MAC System? and how to check if i install it sucessfully?

  1. On macOS, you cannot use apt-get. Instead, use Homebrew to install oneDNN:
brew install onednn

(oneDNN Homebrew formula)

  1. Yes, after installing oneDNN, you should rebuild vLLM from source so it can detect and link to oneDNN (github_issue).

  2. To check if oneDNN is installed, run:

brew list onednn

You should see the installed files. You can also check the include and lib directories, e.g., /opt/homebrew/include/dnnl.h and /opt/homebrew/lib/libdnnl.dylib.

Would you like step-by-step rebuild instructions for vLLM on macOS?

Sources:

I have installd oneDNN by brew sucessfully, and built vLLM again, then built a new docker images. But when I run the container again, I got the same Error Logs

NotImplementedError: Could not run '_C::onednn_mm' with arguments from the 'CPU' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit ``https://fburl.com/ptmfixes`` for possible resolutions. '_C::onednn_mm' is only available for these backends: [Meta, BackendSelect, Python, FuncTorchDynamicLayerBackMode, Functionalize, Named, Conjugate, Negative, ZeroTensor, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, AutogradMPS, AutogradXPU, AutogradHPU, AutogradLazy, AutogradMTIA, AutogradMAIA, AutogradMeta, Tracer, AutocastCPU, AutocastMTIA, AutocastMAIA, AutocastXPU, AutocastMPS, AutocastCUDA, FuncTorchBatched, BatchedNestedTensor, FuncTorchVmapMode, Batched, VmapMode, FuncTorchGradWrapper, PythonTLSSnapshot, FuncTorchDynamicLayerFrontMode, PreDispatch, PythonDispatcher].

RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {'EngineCore_DP0': 1}

On macOS, vLLM does not officially support the oneDNN backend, and the _C::onednn_mm operator is not built for the Apple Silicon CPU backend—even if you install oneDNN with Homebrew (github_issue, docs). The vLLM build system is designed for oneDNN on Linux/x86, not macOS, so the operator is omitted during compilation.

This means you will continue to see this error on Mac, and there is currently no supported workaround to enable oneDNN or the _C::onednn_mm operator on Apple Silicon. vLLM on Mac only supports the basic CPU backend, not the optimized oneDNN path (github_issue).

Would you like more detail on supported features or possible alternatives?

Sources: