This error (getMMAVersionSafe(int, DotOp): Assertion 'false && "computeCapability not supported"' failed) is a known Triton kernel issue when running on new GPU architectures (e.g., Blackwell/RTX 5090) with vLLM and certain quantized models or adapters. The root cause is that the Triton version bundled with vLLM 0.9.0 (pytorch_triton-3.3.0) does not fully support the latest compute capabilities, and the relevant fix is included in Triton 3.3.1 and later. Upgrading Triton to a newer version (3.3.1+) should resolve this assertion error, as confirmed in vLLM issue #17569 and related discussions.
Would you like more detail on how to upgrade Triton or work around this issue?
Sources: