Trying to run gemma-3-27b-it-FP8-dynamic with rocm

youlearnit · July 25, 2025, 6:17pm

yes got it working with transformers==4.53.2.

and with this command:
vllm serve /workspace/models/gemma-3-12b-it/
–tensor-parallel-size 2
–trust-remote-code
–kv-cache-dtype auto
–dtype bfloat16
–max-model-len 2048
–max-num-seqs 2
–port 8000
–host 0.0.0.0

What should I add to that command to be able to work also with the vision and accept images?

RunLLM · July 25, 2025, 6:18pm

To enable vision (image) input for gemma-3-12b-it in vLLM, you should add the --limit-mm-per-prompt '{"image":1}' flag to your command. This allows the model to accept one image per prompt. For example:

vllm serve /workspace/models/gemma-3-12b-it/ \
  --tensor-parallel-size 2 \
  --trust-remote-code \
  --kv-cache-dtype auto \
  --dtype bfloat16 \
  --max-model-len 2048 \
  --max-num-seqs 2 \
  --port 8000 \
  --host 0.0.0.0 \
  --limit-mm-per-prompt '{"image":1}'

For more, see multimodal input docs and supported models.

Would you like a sample API request for sending an image?

Sources:

youlearnit · July 25, 2025, 6:47pm

how to find out which is the correct path to set in the application for the model?

INFO: 192.168.49.122:49900 - “POST /v1/chat/completions HTTP/1.1” 404 Not Found
ERROR 07-25 18:43:32 [serving_chat.py:139] Error with model object=‘error’ message=‘The model google/gemma-3-12b-it does not exist.’ type=‘NotFoundError’ param=None code=404
INFO: 192.168.49.122:60612 - “POST /v1/chat/completions HTTP/1.1” 404 Not Found
ERROR 07-25 18:43:36 [serving_chat.py:139] Error with model object=‘error’ message=‘The model google/gemma-3-12b-it does not exist.’ type=‘NotFoundError’ param=None code=404
INFO: 192.168.49.122:56218 - “POST /v1/chat/completions HTTP/1.1” 404 Not Found
ERROR 07-25 18:45:19 [serving_chat.py:139] Error with model object=‘error’ message=‘The model /workspace/models/gemma-3-12b-it does not exist.’ type=‘NotFoundError’ param=None code=404
INFO: 192.168.49.122:54660 - “POST /v1/chat/completions HTTP/1.1” 404 Not Found

RunLLM · July 25, 2025, 6:47pm

The correct model name/path for your application is the value you set with the --served-model-name flag when launching vLLM. If you do not set this flag, the default is the basename of your model path (e.g., gemma-3-12b-it if you serve /workspace/models/gemma-3-12b-it). Your application must use this exact name in the “model” field of API requests. See vllm issue #19998 and serving docs.

To check available model names, call the /v1/models endpoint on your vLLM server.

Would you like a sample API call to list available models?

Sources:

vllm issue #19998

Topic		Replies	Views
Running gat model with rocm General	2	127	November 28, 2025
Gemma-3-QAT fp4 with rocm General	3	289	October 13, 2025
Issue running gemma-3-27b-it with vLLM version: 0.12.0 General	1	186	December 17, 2025
Not able to run google/gemma-3n-E4B-it General	3	426	September 22, 2025
Which gemma-3 27B for 32GB Vram General	9	1419	September 18, 2025

Trying to run gemma-3-27b-it-FP8-dynamic with rocm

Related topics