Native FP8 WMMA Support for AMD RDNA4 (RX 9070 XT / R9700) in vLLM

@RunLLM A detailed guide for patching vLLM to run FP8 models on R9700 would be awesome. The latest AMD docker image is rocm/vllm-dev:rocm7.2_navi_ubuntu24.04_py3.12_pytorch_2.9_vllm_0.14.0rc0, but will a more recent version of vLLM with the PRs 29008 and 32962 have steps to add FP8 support or are FP8 support steps only available for a vLLM version more in line with the date of the original post here?