Hi
What is the docker run command for this model:
When having 2x 7900 XTX and rocm 7.0 and vllm 0.10.2
Hi
What is the docker run command for this model:
When having 2x 7900 XTX and rocm 7.0 and vllm 0.10.2
To run the google/gemma-3-27b-it-qat-q4_0-gguf model on 2x 7900 XTX GPUs with ROCm 7.0 and vLLM 0.10.2, use the official ROCm vLLM Docker image. Mount your local model directory and specify the model file path inside the container. Example command:
docker run -it --rm \
--network=host \
--group-add=video \
--ipc=host \
--cap-add=SYS_PTRACE \
--security-opt seccomp=unconfined \
--device /dev/kfd \
--device /dev/dri \
-v /path/to/your/models:/app/models \
-e HF_HOME="/app/models" \
rocm/vllm-dev:nightly \
vllm serve /app/models/gemma-3-27b-it-qat-q4_0.gguf --tensor-parallel-size 2
Replace /path/to/your/models with the directory containing your .gguf file. This setup is based on the official ROCm installation instructions and quickstart guide.
Would you like more detail on customizing parameters or troubleshooting?
Sources:
Hint: Mention @RunLLM in the post for followups.
@RunLLM so GGUF models does not need any special tokenizer etc?
No, GGUF models in vLLM generally require you to specify the tokenizer explicitly, using the original base model’s tokenizer, because GGUF tokenizer extraction is not always reliable or supported. For example, use --tokenizer google/gemma-3-27b-it when serving a GGUF model. Automatic extraction from GGUF is being improved, but as of now, specifying the tokenizer is recommended for compatibility and correctness according to GGUF documentation and vLLM GitHub discussions.
Would you like more detail on why this is necessary or how to specify the tokenizer?
Sources:
Hint: Mention @RunLLM in the post for followups.
So how do I run the docker with the -tokenizer google/gemma-3-27b-it
command then?
(APIServer pid=1) pydantic_core._pydantic_core.ValidationError: 1 validation error for ModelConfig
(APIServer pid=1) Value error, Invalid repository ID or local directory specified: ‘google/gemma-3-27b-it-qat-q4_0-gguf’.
(APIServer pid=1) Please verify the following requirements:
(APIServer pid=1) 1. Provide a valid Hugging Face repository ID.
(APIServer pid=1) 2. Specify a local directory that contains a recognized configuration file.
(APIServer pid=1) - For Hugging Face models: ensure the presence of a ‘config.json’.
(APIServer pid=1) - For Mistral models: ensure the presence of a ‘params.json’.
(APIServer pid=1) 3. For GGUF: pass the local path of the GGUF checkpoint.
(APIServer pid=1) Loading GGUF from a remote repo directly is not yet supported.
(APIServer pid=1) [type=value_error, input_value=ArgsKwargs((), {‘model’: …rocessor_plugin’: None}), input_type=ArgsKwargs]
(APIServer pid=1) For further information visit Redirecting...
docker run -it --dns=192.168.49.1 --network=host --group-add=video --ipc=host --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --device /dev/kfd --device /dev/dri --shm-size=16g -e HUGGING_FACE_HUB_TOKEN=“” -e VLLM_SLEEP_WHEN_IDLE=1 -e ROCM_VISIBLE_DEVICES=0 -e HIP_VISIBLE_DEVICES=0 -e HSA_OVERRIDE_GFX_VERSION=11.0.0 -e PYTORCH_ROCM_ARCH=“gfx1100” -e VLLM_USE_TRITON_FLASH_ATTN=0 -e GPU_MAX_HW_QUEUES=1 -e NCCL_DEBUG=WARN -e NCCL_IB_DISABLE=1 --shm-size=10g --name vllm_rocm_gemma3 -v /home/ubuntu/vllm_models:/root/.cache/huggingface rocm/v
llm:latest vllm serve google/gemma-3-27b-it-qat-q4_0-gguf --host 0.0.0.0 --port 8000 --enforce-eager --served-model-name gemma3 --trust-remote-code --dtype auto --kv-cache-dtype auto --max-model-len
2048 --max-num-seqs 2 --max-num-batched-tokens 2048 --gpu-memory-utilization 0.93 --swap-space 8 --disable-log-requests --disable-log-stats --max-log-len 100
[aiter] import [module_aiter_enum] under /usr/local/lib/python3.12/dist-packages/aiter/jit/module_aiter_enum.so
(APIServer pid=1) INFO 12-14 09:24:54 [api_server.py:1338] vLLM API server version 0.11.2.dev673+g839868462
(APIServer pid=1) INFO 12-14 09:24:54 [utils.py:253] non-default args: {‘model_tag’: ‘google/gemma-3-27b-it-qat-q4_0-gguf’, ‘host’: ‘0.0.0.0’, ‘max_log_len’: 100, ‘model’: ‘google/gemma-3-27b-it-qat-q4_0-gguf’, ‘trust_remote_code’: True, ‘max_model_len’: 2048, ‘enforce_eager’: True, ‘served_model_name’: [‘gemma3’], ‘gpu_memory_utilization’: 0.93, ‘swap_space’: 8.0, ‘max_num_batched_tokens’: 2048, ‘max_num_seqs’: 2, ‘disable_log_stats’: True}
(APIServer pid=1) The argument trust_remote_code is to be used with Auto classes. It has no effect here and is ignored.
(APIServer pid=1) Traceback (most recent call last):
(APIServer pid=1) File “/usr/local/bin/vllm”, line 10, in
(APIServer pid=1) sys.exit(main())
(APIServer pid=1) ^^^^^^
(APIServer pid=1) File “/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/main.py”, line 73, in main
(APIServer pid=1) args.dispatch_function(args)
(APIServer pid=1) File “/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/serve.py”, line 60, in cmd
(APIServer pid=1) uvloop.run(run_server(args))
(APIServer pid=1) File “/usr/local/lib/python3.12/dist-packages/uvloop/init.py”, line 96, in run
(APIServer pid=1) return __asyncio.run(
(APIServer pid=1) ^^^^^^^^^^^^^^
(APIServer pid=1) File “/usr/lib/python3.12/asyncio/runners.py”, line 195, in run
(APIServer pid=1) return runner.run(main)
(APIServer pid=1) ^^^^^^^^^^^^^^^^
(APIServer pid=1) File “/usr/lib/python3.12/asyncio/runners.py”, line 118, in run
(APIServer pid=1) return self._loop.run_until_complete(task)
(APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) File “uvloop/loop.pyx”, line 1518, in uvloop.loop.Loop.run_until_complete
(APIServer pid=1) File “/usr/local/lib/python3.12/dist-packages/uvloop/init.py”, line 48, in wrapper
(APIServer pid=1) return await main
(APIServer pid=1) ^^^^^^^^^^
(APIServer pid=1) File “/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py”, line 1385, in run_server
(APIServer pid=1) await run_server_worker(listen_address, sock, args, **uvicorn_kwargs)
(APIServer pid=1) File “/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py”, line 1404, in run_server_worker
(APIServer pid=1) async with build_async_engine_client(
(APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) File “/usr/lib/python3.12/contextlib.py”, line 210, in aenter
(APIServer pid=1) return await anext(self.gen)
(APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) File “/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py”, line 172, in build_async_engine_client
(APIServer pid=1) async with build_async_engine_client_from_engine_args(
(APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) File “/usr/lib/python3.12/contextlib.py”, line 210, in aenter
(APIServer pid=1) return await anext(self.gen)
(APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) File “/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py”, line 198, in build_async_engine_client_from_engine_args
(APIServer pid=1) vllm_config = engine_args.create_engine_config(usage_context=usage_context)
(APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) File “/usr/local/lib/python3.12/dist-packages/vllm/engine/arg_utils.py”, line 1371, in create_engine_config
(APIServer pid=1) model_config = self.create_model_config()
(APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) File “/usr/local/lib/python3.12/dist-packages/vllm/engine/arg_utils.py”, line 1227, in create_model_config
(APIServer pid=1) return ModelConfig(
(APIServer pid=1) ^^^^^^^^^^^^
(APIServer pid=1) File “/usr/local/lib/python3.12/dist-packages/pydantic/_internal/_dataclasses.py”, line 121, in init
(APIServer pid=1) s.pydantic_validator.validate_python(ArgsKwargs(args, kwargs), self_instance=s)
(APIServer pid=1) pydantic_core._pydantic_core.ValidationError: 1 validation error for ModelConfig
(APIServer pid=1) Value error, Invalid repository ID or local directory specified: ‘google/gemma-3-27b-it-qat-q4_0-gguf’.
(APIServer pid=1) Please verify the following requirements:
(APIServer pid=1) 1. Provide a valid Hugging Face repository ID.
(APIServer pid=1) 2. Specify a local directory that contains a recognized configuration file.
(APIServer pid=1) - For Hugging Face models: ensure the presence of a ‘config.json’.
(APIServer pid=1) - For Mistral models: ensure the presence of a ‘params.json’.
(APIServer pid=1) [type=value_error, input_value=ArgsKwargs((), {‘model’: …rocessor_plugin’: None}), input_type=ArgsKwargs]
(APIServer pid=1) For further information visit Redirecting...