Not able to run google/gemma-3n-E4B-it

youlearnit · September 22, 2025, 9:23pm

Not able to run google/gemma-3n-E4B-it on local vllm with 2 gpus.

Trying to run it with:

docker run -e VLLM_SLEEP_WHEN_IDLE=1 -it --gpus all --ipc=host --network host -p 8000:8000 -v /home/ubuntu/vllm_models:/root/.cache/huggingface -e HUGGING_FACE_HUB_TOKEN=“” --name vllm_tp2-gemma-3n-E4B-it vllm-openai-nccl --tensor-parallel-size 2 --model google/gemma-3n-E4B-it --served-model-name vllm/gemma-3n --max-model-len 2048 --max-num-seqs 64

gemma-3-12b-it works fine.

ImportError: The model architecture ‘mobilenetv5_300m_enc’ is not supported in your version of timm (0.9.10). Please upgrade timm to a more recent version with pip install -U timm.
(VllmWorker TP0 pid=152) INFO 09-22 14:18:21 [multiproc_executor.py:520] Parent process exited, terminating worker
(VllmWorker TP1 pid=153) INFO 09-22 14:18:21 [multiproc_executor.py:520] Parent process exited, terminating worker
[rank0]:[W922 14:18:22.767945607 ProcessGroupNCCL.cpp:1479] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see Distributed communication package - torch.distributed — PyTorch 2.8 documentation (function operator())

RunLLM · September 22, 2025, 9:23pm

The error occurs because the version of timm in your Docker image (0.9.10) does not support the mobilenetv5_300m_enc architecture required by Gemma 3n models. You need to upgrade timm to at least 1.0.17 and also ensure transformers is at least 4.55.4 for full Gemma 3n support. This is a known issue and has been fixed in newer Docker images and code; as a workaround, you can run pip install --upgrade timm==1.0.19 transformers==4.55.4 inside your container before starting vLLM, or rebuild your Docker image with these versions pre-installed. See the official fix and workaround details in this issue and this PR.

Would you like step-by-step instructions for updating your Docker image or container?

Sources:

youlearnit · September 22, 2025, 9:26pm

how can I get the latest docker image where these are fixed?
sudo docker pull vllm/vllm-openai:latest
have done that but looks like its not the latest?

youlearnit · September 22, 2025, 10:08pm

upgraded the timm, but now getting this:

ERROR 09-22 15:07:24 [core.py:700] RuntimeError: Worker failed with error ‘mat1 and mat2 shapes cannot be multiplied (1792x2048 and 1024x2048)’, please check the stack trace above for the root cause
(EngineCore_0 pid=102) ERROR 09-22 15:07:26 [multiproc_executor.py:146] Worker proc VllmWorker-1 died unexpectedly, shutting down executor.
(EngineCore_0 pid=102) Process EngineCore_0:
(EngineCore_0 pid=102) Traceback (most recent call last):
(EngineCore_0 pid=102) File “/usr/lib/python3.12/multiprocessing/process.py”, line 314, in _bootstrap
(EngineCore_0 pid=102) self.run()
(EngineCore_0 pid=102) File “/usr/lib/python3.12/multiprocessing/process.py”, line 108, in run
(EngineCore_0 pid=102) self._target(*self._args, **self._kwargs)
(EngineCore_0 pid=102) File “/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py”, line 704, in run_engine_core
(EngineCore_0 pid=102) raise e
(EngineCore_0 pid=102) File “/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py”, line 691, in run_engine_core
(EngineCore_0 pid=102) engine_core = EngineCoreProc(*args, **kwargs)
(EngineCore_0 pid=102) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=102) File “/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py”, line 492, in init
(EngineCore_0 pid=102) super().init(vllm_config, executor_class, log_stats,
(EngineCore_0 pid=102) File “/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py”, line 89, in init
(EngineCore_0 pid=102) self._initialize_kv_caches(vllm_config)
(EngineCore_0 pid=102) File “/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py”, line 179, in _initialize_kv_caches
(EngineCore_0 pid=102) self.model_executor.determine_available_memory())
(EngineCore_0 pid=102) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=102) File “/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py”, line 76, in determine_available_memory
(EngineCore_0 pid=102) output = self.collective_rpc(“determine_available_memory”)
(EngineCore_0 pid=102) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=102) File “/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py”, line 243, in collective_rpc
(EngineCore_0 pid=102) result = get_response(w, dequeue_timeout)
(EngineCore_0 pid=102) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=102) File “/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py”, line 230, in get_response
(EngineCore_0 pid=102) raise RuntimeError(
(EngineCore_0 pid=102) RuntimeError: Worker failed with error ‘mat1 and mat2 shapes cannot be multiplied (1792x2048 and 1024x2048)’, please check the stack trace above for the root cause
(APIServer pid=1) Traceback (most recent call last):

Topic		Replies	Views
Gemma 3 Quantization General	5	783	June 21, 2025
Gemma3 on a T4 GPU General	5	449	May 19, 2025
Trying to run gemma-3-27b-it-FP8-dynamic with rocm General	23	296	July 25, 2025
Running Gemma 3 on multi-chip TPU failure Google TPU Support	5	382	May 1, 2025
Can anyone help me? Why is this not working? It used 😭 NVIDIA GPU Support	1	584	May 8, 2025

Not able to run google/gemma-3n-E4B-it

Related topics