To enable vision (image) input for gemma-3-12b-it in vLLM, you should add the --limit-mm-per-prompt '{"image":1}' flag to your command. This allows the model to accept one image per prompt. For example:
how to find out which is the correct path to set in the application for the model?
INFO: 192.168.49.122:49900 - “POST /v1/chat/completions HTTP/1.1” 404 Not Found
ERROR 07-25 18:43:32 [serving_chat.py:139] Error with model object=‘error’ message=‘The model google/gemma-3-12b-it does not exist.’ type=‘NotFoundError’ param=None code=404
INFO: 192.168.49.122:60612 - “POST /v1/chat/completions HTTP/1.1” 404 Not Found
ERROR 07-25 18:43:36 [serving_chat.py:139] Error with model object=‘error’ message=‘The model google/gemma-3-12b-it does not exist.’ type=‘NotFoundError’ param=None code=404
INFO: 192.168.49.122:56218 - “POST /v1/chat/completions HTTP/1.1” 404 Not Found
ERROR 07-25 18:45:19 [serving_chat.py:139] Error with model object=‘error’ message=‘The model /workspace/models/gemma-3-12b-it does not exist.’ type=‘NotFoundError’ param=None code=404
INFO: 192.168.49.122:54660 - “POST /v1/chat/completions HTTP/1.1” 404 Not Found
The correct model name/path for your application is the value you set with the --served-model-name flag when launching vLLM. If you do not set this flag, the default is the basename of your model path (e.g., gemma-3-12b-it if you serve /workspace/models/gemma-3-12b-it). Your application must use this exact name in the “model” field of API requests. See vllm issue #19998 and serving docs.
To check available model names, call the /v1/models endpoint on your vLLM server.
Would you like a sample API call to list available models?