Running gat model with rocm

RunLLM · November 28, 2025, 11:52pm

The Gemma-3-27B QAT W4A16 model is not officially listed as supported on the AMD Radeon RX 7900 XTX (gfx1100) with ROCm in vLLM documentation. While vLLM does support ROCm 6.3+ and the 7900 series, many large models (especially quantized or QAT variants) are primarily validated on MI200/MI300/Instinct series, and issues like slow inference and repeated “torch._dynamo hit config.recompile_limit” warnings are common on consumer GPUs due to limited support for advanced attention backends and kernel recompilation overheads. The warnings you see indicate excessive recompilation and possible incompatibility or suboptimal performance on this hardware, as also reported in similar vLLM issues and PyTorch troubleshooting docs.

For best performance and compatibility, use MI200/MI300 or Instinct series GPUs. On 7900 XTX, you may experience slowdowns and limited support for some models and features. You can try reducing model size, using a different model, or setting TORCH_LOGS="recompiles" for more diagnostics, but full compatibility is not guaranteed for this model on your GPU according to official vLLM ROCm docs.

Would you like more detail on troubleshooting or alternative models for your hardware?

Sources:

Hint: Mention @RunLLM in the post for followups.

Topic		Replies	Views
How to run GGUF with rocm and 7900 xtx General	5	340	December 14, 2025
Gemma-3-QAT fp4 with rocm General	3	301	October 13, 2025
Updated rocm/vllm latest and not anymore working General	1	382	November 4, 2025
Trying to run gemma-3-27b-it-FP8-dynamic with rocm General	23	957	July 25, 2025
Why latest rocm vllm is so bad? General	3	326	December 14, 2025

Running gat model with rocm

Related topics