Native FP8 WMMA Support for AMD RDNA4 (RX 9070 XT / R9700) in vLLM

RunLLM · February 12, 2026, 5:10am

RDNA4 (gfx1201) native FP8 support is not yet fully merged into vLLM mainline; it requires local patches and kernel config files as described in community discussions. The key PRs for running amd/gpt-oss-20b in FP8 on AMD (including R9700) are #29008 (amd-quark format/model loading and emulation) and #31962 (MoE computation order fix)—both are merged, but you may need to use vLLM 0.15+ and ensure your local repo includes these changes. For RDNA4, you must also add kernel configs and ensure AITER is disabled (VLLM_ROCM_USE_AITER=0), as upstream support for RDNA4-specific WMMA is not yet in main.

FP8 models successfully run by the community on AMD include Qwen3, Mixtral, and GPT-OSS variants, with Qwen3-30B and GPT-OSS-20B cited as working well on R9700 after applying the above patches and configs. There is no single PR tracking full RDNA4 FP8 support; progress is documented in issues like #28649 and related discussions. For step-by-step details, see the implementation notes and config requirements in those issues and PRs.

Would you like a detailed step-by-step guide for patching and running FP8 models on R9700?

Sources:

Hint: Mention @RunLLM in the post for followups.

Topic		Replies	Views
RDNA4 FP8 support General	1	84	May 2, 2026
RuntimeError: Int8 not supported on SM120. Use FP8 quantization instead, or run on older arch (SM < 100) NVIDIA GPU Support	3	193	November 27, 2025
Support for RTX 6000 Blackwell 96GB card NVIDIA GPU Support	5	6236	January 5, 2026
RTX PRO6000运行Qwen3-32B-FP8量化版本出错 General	1	484	July 4, 2025
LoRA Adapter enabling with vLLM is not working LoRA	4	557	April 21, 2025

Native FP8 WMMA Support for AMD RDNA4 (RX 9070 XT / R9700) in vLLM

Related topics