Unable to use vLLM 0.10.1-gptoss on GH200 (aarch64) — source for custom wheel not available?

Hi everyone,

I’m currently running vLLM on NVIDIA GH200 systems (aarch64 architecture) and have been using it successfully with older builds. I’d like to upgrade to the version, vLLM 0.10.1-gptoss but I’m running into an issue.

There is a prebuilt wheel available for this version, but it’s only built for amd64, and unfortunately, I can’t use it on my ARM-based infrastructure.

I’d normally build it from source, but I haven’t been able to find the corresponding source code or tag for 0.10.1-gptoss in the GitHub repository.

Is there any plan to:

  • Publish the source code for the 0.10.1-gptoss build?
  • Provide ARM64-compatible wheels?

Thanks in advance, and appreciate all the hard work on vLLM — it’s an excellent project.

Best regards,
Raphael

There is currently no official ARM64 (aarch64) prebuilt wheel for vLLM 0.10.1-gptoss; only x86_64 wheels are published. The source code or tag for 0.10.1-gptoss is not available in the main vLLM GitHub repository, as this is a special build for GPT-OSS and not a standard release branch. The maintainers have not announced plans to publish ARM64 wheels or the corresponding source for this build, and users in similar situations have reported the same limitation. The recommended approach is to use the prebuilt x86_64 wheel or Docker image if possible, or to wait for official ARM64 support or source release for this variant. See discussion in GitHub Issue #22276 and PR #22259.

If you need to run vLLM 0.10.1-gptoss on ARM64, you may need to wait for the maintainers to provide ARM64 wheels or the source code, as there is no supported method to build this version from source on ARM at this time. Would you like more detail on possible workarounds or community-maintained ARM64 builds?

Sources:

apparently the source for the gptoss build is in forked repo. Support gpt-oss by zyongye · Pull Request #22259 · vllm-project/vllm · GitHub if you clone **zyongye fork and follow the instructions you’ll be able to build the gpt0.10.1 build apparently. you just need to checkout to the commit mentioned and build from source.

I have built and have gpt-oss running on our organizations gh200 and it is very slow. I’m not sure why but for me its only running at around 11-15tok/s the same model and quant level on llama.cpp runs at 150 tok/s on the gh200.

not sure if its because of ARM or something else**

Correct. To build vLLM 0.10.1-gptoss from source for ARM64, clone the zyongye fork, checkout commit 8260948cdc379d13bf4b80d3172a03d21a983e05, and follow the build instructions provided in PR #22259. This approach has been confirmed to work for users on ARM/A100 and similar platforms.

Would you like a step-by-step guide for building on ARM64?

Sources: