HI,
I have 7900 XTX on Ubuntu 24.04 and have pulled the rocm/vllm:latest.
Am I able to run these Google gemma3 QAT models on rocm?
HI,
I have 7900 XTX on Ubuntu 24.04 and have pulled the rocm/vllm:latest.
Am I able to run these Google gemma3 QAT models on rocm?
You can run Google Gemma 3 QAT (Quantization Aware Training) models on a 7900 XTX with ROCm, but there are important caveats. vLLM supports AMD GPUs with ROCm 6.3+ and the 7900 XTX is listed as supported hardware. However, quantization support is limited: GGUF quantized Gemma 3 models are only partially supported, and some users report issues such as empty outputs or errors when running QAT GGUF models on ROCm/consumer GPUs. Full support and stability for Gemma 3 QAT GGUF models is not guaranteed on ROCm, and some quantization methods (like AWQ) are only optimized for MI300x, not consumer cards like the 7900 XTX. You may need to build vLLM from source and use the model’s local path, not a Hugging Face repo ID, for GGUF models. See vLLM ROCm install docs and Gemma 3 GGUF support discussion.
If you want to try, follow the ROCm build instructions, use the latest vLLM from source, and run the model from a local GGUF file. Be aware that performance may be slow and you may encounter errors or empty outputs, especially with QAT GGUF models. For best results, use unquantized or non-QAT models, or consider using MI300x for full quantization support. Would you like a step-by-step guide for setup and troubleshooting?
Sources:
Hint: Mention @RunLLM in the post for followups.
Yes, @RunLLM
I need step by step instructions. I can lready run unquantized Gemma3 models with my setup. But I need info what is needed to run any quantized Gemma3 model.
To run quantized Gemma 3 models (e.g., QAT or GGUF) on your AMD 7900 XTX with ROCm, you must use vLLM built from source, and run the model from a local GGUF file. Note that GGUF quantization support on consumer AMD GPUs is experimental and may result in errors or empty outputs, especially for QAT models. Here are the steps:
Download the Quantized GGUF Model:
Go to the Hugging Face page for your desired Gemma 3 quantized model (e.g., QAT GGUF) and download the .gguf file to your local machine.
Build vLLM from Source for ROCm:
Follow the official ROCm installation guide to build vLLM for ROCm. Use the provided Dockerfile or build instructions for ROCm, ensuring you have ROCm 6.3+ and all dependencies.
Run the Model with vLLM:
Use the following command, replacing the paths as needed:
vllm serve /path/to/gemma-3-27b-it-q4_0.gguf \
--tokenizer google/gemma-3-27b-it \
--gpu-memory-utilization 0.95 \
--max-model-len 4096
--tokenizer google/gemma-3-27b-it).--hf-config-path google/gemma-3-27b-it.Troubleshooting:
Would you like more detail on any specific step or error message?
Sources:
Hint: Mention @RunLLM in the post for followups.