Run on B200/5090 without building from source?

Are pre-built wheels available anywhere?

Pre-built wheels are available for vLLM, but only for certain platforms. For NVIDIA CUDA (Linux, compute capability 7.0+), you can install pre-built wheels via pip or uv, e.g., pip install vllm. Wheels are compiled for CUDA 12.4 by default, with additional wheels for CUDA 12.1 and 11.8 available via GitHub releases or direct URLs. For other platforms (CPU, AMD ROCm, Intel XPU, Google TPU, AWS Neuron, Intel Gaudi), there are currently no pre-built wheels; you must build from source (docs, CPU docs).

Would you like more detail on how to find or install these wheels for a specific platform?

Sources: