RDNA4 FP8 support

AnttiK · May 2, 2026, 12:03am

Does vLLM support FP8 with RDNA4?

RunLLM · May 2, 2026, 12:04am

Yes, vLLM now supports FP8 quantization and inference on AMD RDNA4 (gfx12xx) GPUs, including native hardware-accelerated FP8 matrix multiplication and FP8 KV-cache, provided you use a recent vLLM version and the necessary kernel/config patches are present. This enables significant memory savings and throughput improvements for FP8 models on RDNA4, similar to MI300/CDNA4 and NVIDIA Hopper/Ada GPUs. However, some manual configuration or patches may still be required for optimal performance and full model compatibility, as upstream support is actively evolving—see the linked issues and PRs for details and best practices on enabling FP8 on RDNA4 hardware.

For more information and technical details, see this feature request and implementation guide, as well as the relevant PRs, official documentation, and quantized KV cache docs.

Would you like more detail on setup, kernel configs, or troubleshooting?

Sources:

Hint: Mention @RunLLM in the post for followups.

Topic		Replies	Views
Native FP8 WMMA Support for AMD RDNA4 (RX 9070 XT / R9700) in vLLM General	5	2405	February 12, 2026
KV Cache quantizing? NVIDIA GPU Support	3	1164	June 2, 2025
NVFP4 Support In Attention KV-Cache	1	770	March 16, 2026
RuntimeError: Int8 not supported on SM120. Use FP8 quantization instead, or run on older arch (SM < 100) NVIDIA GPU Support	1	242	November 19, 2025
Support for RTX 6000 Blackwell 96GB card NVIDIA GPU Support	5	6950	January 5, 2026

RDNA4 FP8 support

Related topics