How to build a VLLM python wheel can be used by other GPU types?

llc-kc · March 21, 2025, 7:41am

I built a VLLM(0.7.2) wheel on a machine with A10 GPU using “python setup.py bdist_wheel”. The VLLM serve works in the build machine. However, when I using the wheel on a A800 GPU machine to run VLLM serve, error occers:
engine.py:518] RuntimeError: CUDA error: no kernel image is available for execution on the device
engine.py:518] CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
So how to build a wheel that can be used by other GPU types like VLLM officially provided?

comaniac · March 21, 2025, 8:05am

You need to specify multiple GPU arches manually. You can refer to this script that vLLM uses on the CI: vllm/.github/workflows/scripts/build.sh at main · vllm-project/vllm · GitHub

llc-kc · March 21, 2025, 9:10am

Thank you very much, I will try it.

Topic		Replies	Views
Run on B200/5090 without building from source? NVIDIA GPU Support	1	47	May 1, 2025
How to setup amd gpu as default in dual stack gpu? AMD GPU Support	10	117	April 21, 2025
Struggling with my dual GPU setup. And getting chat template errors NVIDIA GPU Support	2	16	May 30, 2025
Why vllm cannot fully use GPU in batch processing General	12	180	March 29, 2025
Docker image `vllm/vllm-openai:v0.9.0` doesn't work on 5090 General	3	48	June 10, 2025

How to build a VLLM python wheel can be used by other GPU types?

Related topics