Omitting the vision stack for gemma3?

malte · March 22, 2025, 4:04pm

In the documentation of Gemma3, it is mentioned that you can omit the vision stack by running the gemma-3-27b-it model with Gemma3ForCausalLM instead of Gemma3ForConditionalGeneration. Is there a way to do this with vLLM? I tried overriding the models config.json, but I’m running into issues with missing configuration parameters.

AttributeError: 'Gemma3Config' object has no attribute 'num_hidden_layers'

Possibly similar to this github issue: AttributeError: 'Gemma3Config' object has no attribute 'vocab_size' · Issue #36683 · huggingface/transformers · GitHub

ywang96 · March 23, 2025, 6:44am

Currently we don’t provide a way to allow users to load only the language model from a multimodal model, but this is indeed a feature that we’re considering supporting!

Topic		Replies	Views
Trouble Running vLLM `0.9.1` with Latest Transformers (Gemma3nConfig Errors) General	1	210	June 30, 2025
Gemma 3 Quantization General	5	528	June 21, 2025
Trying to run gemma-3-27b-it-FP8-dynamic with rocm General	23	126	July 25, 2025
Gemma3 on a T4 GPU General	5	318	May 19, 2025
Gemma 3 prefix caching in case of multimodal prompts Model Support	4	119	May 22, 2025

Omitting the vision stack for gemma3?

Related topics