Unable to use vLLM 0.10.1-gptoss on GH200 (aarch64) — source for custom wheel not available?

rhabichl · August 7, 2025, 6:25am

Hi everyone,

I’m currently running vLLM on NVIDIA GH200 systems (aarch64 architecture) and have been using it successfully with older builds. I’d like to upgrade to the version, vLLM 0.10.1-gptoss but I’m running into an issue.

There is a prebuilt wheel available for this version, but it’s only built for amd64, and unfortunately, I can’t use it on my ARM-based infrastructure.

I’d normally build it from source, but I haven’t been able to find the corresponding source code or tag for 0.10.1-gptoss in the GitHub repository.

Is there any plan to:

Publish the source code for the 0.10.1-gptoss build?
Provide ARM64-compatible wheels?

Thanks in advance, and appreciate all the hard work on vLLM — it’s an excellent project.

Best regards,
Raphael

RunLLM · August 7, 2025, 6:25am

There is currently no official ARM64 (aarch64) prebuilt wheel for vLLM 0.10.1-gptoss; only x86_64 wheels are published. The source code or tag for 0.10.1-gptoss is not available in the main vLLM GitHub repository, as this is a special build for GPT-OSS and not a standard release branch. The maintainers have not announced plans to publish ARM64 wheels or the corresponding source for this build, and users in similar situations have reported the same limitation. The recommended approach is to use the prebuilt x86_64 wheel or Docker image if possible, or to wait for official ARM64 support or source release for this variant. See discussion in GitHub Issue #22276 and PR #22259.

If you need to run vLLM 0.10.1-gptoss on ARM64, you may need to wait for the maintainers to provide ARM64 wheels or the source code, as there is no supported method to build this version from source on ARM at this time. Would you like more detail on possible workarounds or community-maintained ARM64 builds?

Sources:

RWayne93 · August 15, 2025, 9:07am

apparently the source for the gptoss build is in forked repo. Support gpt-oss by zyongye · Pull Request #22259 · vllm-project/vllm · GitHub if you clone **zyongye fork and follow the instructions you’ll be able to build the gpt0.10.1 build apparently. you just need to checkout to the commit mentioned and build from source.

I have built and have gpt-oss running on our organizations gh200 and it is very slow. I’m not sure why but for me its only running at around 11-15tok/s the same model and quant level on llama.cpp runs at 150 tok/s on the gh200.

not sure if its because of ARM or something else**

RunLLM · August 15, 2025, 9:07am

Correct. To build vLLM 0.10.1-gptoss from source for ARM64, clone the zyongye fork, checkout commit 8260948cdc379d13bf4b80d3172a03d21a983e05, and follow the build instructions provided in PR #22259. This approach has been confirmed to work for users on ARM/A100 and similar platforms.

Would you like a step-by-step guide for building on ARM64?

Sources:

PR #22259

Topic		Replies	Views
Run on B200/5090 without building from source? NVIDIA GPU Support	1	229	May 1, 2025
Use editable vllm for different GPU types General	1	67	August 31, 2025
How to build a VLLM python wheel can be used by other GPU types? General	2	150	March 21, 2025
Build issues when serving gpt-oss-20B on Tesla T4 GPUs with vLLM General	9	558	September 30, 2025
Anybody tried to use VLLM with torch==2.7.1 General	1	657	July 2, 2025

Unable to use vLLM 0.10.1-gptoss on GH200 (aarch64) — source for custom wheel not available?

Related topics