Gemma-3-QAT fp4 with rocm

youlearnit · October 13, 2025, 1:05pm

HI,

I have 7900 XTX on Ubuntu 24.04 and have pulled the rocm/vllm:latest.

Am I able to run these Google gemma3 QAT models on rocm?

RunLLM · October 13, 2025, 1:06pm

You can run Google Gemma 3 QAT (Quantization Aware Training) models on a 7900 XTX with ROCm, but there are important caveats. vLLM supports AMD GPUs with ROCm 6.3+ and the 7900 XTX is listed as supported hardware. However, quantization support is limited: GGUF quantized Gemma 3 models are only partially supported, and some users report issues such as empty outputs or errors when running QAT GGUF models on ROCm/consumer GPUs. Full support and stability for Gemma 3 QAT GGUF models is not guaranteed on ROCm, and some quantization methods (like AWQ) are only optimized for MI300x, not consumer cards like the 7900 XTX. You may need to build vLLM from source and use the model’s local path, not a Hugging Face repo ID, for GGUF models. See vLLM ROCm install docs and Gemma 3 GGUF support discussion.

If you want to try, follow the ROCm build instructions, use the latest vLLM from source, and run the model from a local GGUF file. Be aware that performance may be slow and you may encounter errors or empty outputs, especially with QAT GGUF models. For best results, use unquantized or non-QAT models, or consider using MI300x for full quantization support. Would you like a step-by-step guide for setup and troubleshooting?

Sources:

Hint: Mention @RunLLM in the post for followups.

youlearnit · October 13, 2025, 1:08pm

Yes, @RunLLM
I need step by step instructions. I can lready run unquantized Gemma3 models with my setup. But I need info what is needed to run any quantized Gemma3 model.

RunLLM · October 13, 2025, 1:09pm

To run quantized Gemma 3 models (e.g., QAT or GGUF) on your AMD 7900 XTX with ROCm, you must use vLLM built from source, and run the model from a local GGUF file. Note that GGUF quantization support on consumer AMD GPUs is experimental and may result in errors or empty outputs, especially for QAT models. Here are the steps:

Download the Quantized GGUF Model:
Go to the Hugging Face page for your desired Gemma 3 quantized model (e.g., QAT GGUF) and download the .gguf file to your local machine.
Build vLLM from Source for ROCm:
Follow the official ROCm installation guide to build vLLM for ROCm. Use the provided Dockerfile or build instructions for ROCm, ensuring you have ROCm 6.3+ and all dependencies.
Run the Model with vLLM:
Use the following command, replacing the paths as needed:
```
vllm serve /path/to/gemma-3-27b-it-q4_0.gguf \
  --tokenizer google/gemma-3-27b-it \
  --gpu-memory-utilization 0.95 \
  --max-model-len 4096
```
- Always use the tokenizer from the base model (--tokenizer google/gemma-3-27b-it).
- If you encounter errors about missing config, add --hf-config-path google/gemma-3-27b-it.
- For multi-part GGUF files, merge them into a single file using gguf-split.
Troubleshooting:
- If you get empty outputs or errors, this is a known limitation on consumer AMD GPUs (see discussion).
- Some quantization methods (e.g., AWQ) are only optimized for MI300x, not 7900 XTX (see ROCm quantization support).
- If you see “quantization is not fully optimized yet” warnings, performance may be poor or outputs may be invalid.

Would you like more detail on any specific step or error message?

Sources:

Hint: Mention @RunLLM in the post for followups.

Topic		Replies	Views
Trying to run gemma-3-27b-it-FP8-dynamic with rocm General	23	367	July 25, 2025
Gemma 3 Quantization General	5	879	June 21, 2025
Which gemma-3 27B for 32GB Vram General	9	341	September 18, 2025
Gemma3 on a T4 GPU General	5	496	May 19, 2025
How to setup amd gpu as default in dual stack gpu? AMD GPU Support	10	354	April 21, 2025

Gemma-3-QAT fp4 with rocm

Related topics