How to build a VLLM python wheel can be used by other GPU types?

I built a VLLM(0.7.2) wheel on a machine with A10 GPU using “python setup.py bdist_wheel”. The VLLM serve works in the build machine. However, when I using the wheel on a A800 GPU machine to run VLLM serve, error occers:
engine.py:518] RuntimeError: CUDA error: no kernel image is available for execution on the device
engine.py:518] CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
So how to build a wheel that can be used by other GPU types like VLLM officially provided?

You need to specify multiple GPU arches manually. You can refer to this script that vLLM uses on the CI: vllm/.github/workflows/scripts/build.sh at main · vllm-project/vllm · GitHub

1 Like

Thank you very much, I will try it.