Clarify VLLM Wheels: What Does the +cu129 Tag Actually Change in v0.11.x?

Issue Description:

In version 0.11.1, the release notes state:

Updated the default CUDA build to torch==2.9.0+cu129.

However, I noticed that both v0.11.1 and v0.11.2 provide CUDA 12.9 builds, but interestingly, the wheel file sizes differ between them. For example:

  • vllm-0.11.1-cp38-abi3-manylinux1_x86_64.whl

  • vllm-0.11.1+cu129-cp38-abi3-manylinux1_x86_64.whl

  • vllm-0.11.2-cp38-abi3-manylinux1_x86_64.whl

  • vllm-0.11.2+cu129-cp38-abi3-manylinux1_x86_64.whl

Could you clarify:

  1. What are the actual differences between these builds (e.g., CUDA support, dependencies, optimizations)?

  2. Are there any recommendations on which version to use under what circumstances?

Thank you!

It seems likely that the difference between wheel files like vllm-0.11.1-cp38-abi3-manylinux1_x86_64.whl and vllm-0.11.1+cu129-cp38-abi3-manylinux1_x86_64.whl is the CUDA version they are built against: the file with +cu129 is explicitly built for CUDA 12.9, while the one without the suffix is the “default” variant, which may also be CUDA 12.9 but is treated as the main build for that release. The difference in file size could be due to changes in dependencies, build optimizations, or minor code changes between releases, but both should support CUDA 12.9 if that’s the default for those versions. The variant naming and organization is described in the documentation and the variant organization guide.

My understanding is that you should use the wheel that matches your CUDA runtime environment. If you have CUDA 12.9, either the default or the +cu129 wheel should work, but using the explicit +cu129 variant may help avoid ambiguity. For other CUDA versions, select the wheel with the matching suffix. Always ensure your PyTorch version matches the CUDA version of the vLLM wheel. For more details, see the installation guide and variant organization.

Would you like a step-by-step breakdown of how to choose and install the correct wheel?

Sources:

Hint: Mention @RunLLM in the post for followups.