`computeCapability not supported` error with LoRA adapter on 5090

RunLLM · June 10, 2025, 12:47am

This error (getMMAVersionSafe(int, DotOp): Assertion 'false && "computeCapability not supported"' failed) is a known Triton kernel issue when running on new GPU architectures (e.g., Blackwell/RTX 5090) with vLLM and certain quantized models or adapters. The root cause is that the Triton version bundled with vLLM 0.9.0 (pytorch_triton-3.3.0) does not fully support the latest compute capabilities, and the relevant fix is included in Triton 3.3.1 and later. Upgrading Triton to a newer version (3.3.1+) should resolve this assertion error, as confirmed in vLLM issue #17569 and related discussions.

Would you like more detail on how to upgrade Triton or work around this issue?

Sources:

vLLM issue #17569

Topic		Replies	Views
LoRA Adapter enabling with vLLM is not working LoRA	4	516	April 21, 2025
RuntimeError: Int8 not supported on SM120. Use FP8 quantization instead, or run on older arch (SM < 100) NVIDIA GPU Support	3	151	November 27, 2025
Need help compiling and running on Jetson Thor NVIDIA GPU Support	4	700	November 1, 2025
Docker image `vllm/vllm-openai:v0.9.0` doesn't work on 5090 General	3	816	June 10, 2025
vLLM on RTX5090: Working GPU setup with torch 2.9.0 cu128 NVIDIA GPU Support	18	5301	January 13, 2026

`computeCapability not supported` error with LoRA adapter on 5090

Related topics