Running gat model with rocm

The Gemma-3-27B QAT W4A16 model is not officially listed as supported on the AMD Radeon RX 7900 XTX (gfx1100) with ROCm in vLLM documentation. While vLLM does support ROCm 6.3+ and the 7900 series, many large models (especially quantized or QAT variants) are primarily validated on MI200/MI300/Instinct series, and issues like slow inference and repeated “torch._dynamo hit config.recompile_limit” warnings are common on consumer GPUs due to limited support for advanced attention backends and kernel recompilation overheads. The warnings you see indicate excessive recompilation and possible incompatibility or suboptimal performance on this hardware, as also reported in similar vLLM issues and PyTorch troubleshooting docs.

For best performance and compatibility, use MI200/MI300 or Instinct series GPUs. On 7900 XTX, you may experience slowdowns and limited support for some models and features. You can try reducing model size, using a different model, or setting TORCH_LOGS="recompiles" for more diagnostics, but full compatibility is not guaranteed for this model on your GPU according to official vLLM ROCm docs.

Would you like more detail on troubleshooting or alternative models for your hardware?

Sources:

Hint: Mention @RunLLM in the post for followups.