Your vLLM server is running with --served-model-name vllm/gemma-3, so your benchmark command must use --model vllm/gemma-3 (not the Hugging Face name). Also, make sure you use standard dashes (-), not typographic dashes (–), in your command—otherwise, arguments may not be parsed correctly.
If you still get “Bad Request,” double-check for any invisible Unicode characters or copy-paste issues in your command. This exact approach is confirmed to work in GitHub Issue #19743.
Would you like help checking for hidden character issues or a way to verify the served model name via the API?
You seem not to understand, with that command it wont work:
File “/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/main.py”, line 54, in main
args.dispatch_function(args)
File “/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/benchmark/serve.py”, line 21, in cmd
main(args)
File “/usr/local/lib/python3.12/dist-packages/vllm/benchmarks/serve.py”, line 959, in main
return asyncio.run(main_async(args))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/usr/lib/python3.12/asyncio/runners.py”, line 195, in run
return runner.run(main)
^^^^^^^^^^^^^^^^
File “/usr/lib/python3.12/asyncio/runners.py”, line 118, in run
return self._loop.run_until_complete(task)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/usr/lib/python3.12/asyncio/base_events.py”, line 691, in run_until_complete
return future.result()
^^^^^^^^^^^^^^^
File “/usr/local/lib/python3.12/dist-packages/vllm/benchmarks/serve.py”, line 1002, in main_async
tokenizer = get_tokenizer(tokenizer_id,
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/usr/local/lib/python3.12/dist-packages/vllm/transformers_utils/tokenizer.py”, line 239, in get_tokenizer
tokenizer = AutoTokenizer.from_pretrained(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/usr/local/lib/python3.12/dist-packages/transformers/models/auto/tokenization_auto.py”, line 1049, in from_pretrained
tokenizer_config = get_tokenizer_config(pretrained_model_name_or_path, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/usr/local/lib/python3.12/dist-packages/transformers/models/auto/tokenization_auto.py”, line 881, in get_tokenizer_config
resolved_config_file = cached_file(
^^^^^^^^^^^^
File “/usr/local/lib/python3.12/dist-packages/transformers/utils/hub.py”, line 321, in cached_file
file = cached_files(path_or_repo_id=path_or_repo_id, filenames=[filename], **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/usr/local/lib/python3.12/dist-packages/transformers/utils/hub.py”, line 511, in cached_files
raise OSError(
OSError: vllm/gemma-3 is not a local folder and is not a valid model identifier listed on ‘https://huggingface.co/models’
If this is a private repository, make sure to pass a token having permission to this repo either by logging in with hf auth login or by passing token=<your_token>
Thank you for the detailed traceback. The root cause is: the benchmark client tries to load the tokenizer using the --model value (vllm/gemma-3), but this is not a valid Hugging Face repo or a local directory. The benchmark script needs a valid tokenizer path, not just the served model name.
Solution:
You must explicitly provide the tokenizer path using --tokenizer, pointing to the local directory containing tokenizer_config.json (usually the Hugging Face snapshot directory). For example:
To automatically find the correct tokenizer path inside your container, run this command in your Docker shell:
find /root/.cache/huggingface/hub/models--google--gemma-3-12b-it -type d -name "snapshots" -exec find {} -type d \; | while read d; do if [ -f "$d/tokenizer_config.json" ]; then echo "$d"; fi; done
This will print the full path to the snapshot directory containing tokenizer_config.json, which you should use as the --tokenizer argument in your benchmark command. This approach is based on the Hugging Face cache structure described in GitHub Issue #9459.
Would you like a ready-to-use benchmark command with the discovered path?
it never worked and after half day trying vllm docker benchmark with your, openai and gemini instructions does not work. Maybe if the model is hosted outside docker in local computer is the reason.
thank you for trying, I will never try again. Waste of time.
I’m sorry for your frustration—vLLM benchmarking with Docker and local models can be tricky due to path and environment issues. Mounting the model directory into Docker should work, but the path inside the container must exactly match what vLLM expects (the snapshot directory containing config.json and tokenizer_config.json). Many users have faced similar issues, especially with local models and tokenizer paths, as documented in GitHub Issue #9459 and #8938.
If you ever reconsider, I recommend:
Verifying the exact snapshot path inside the container.
Explicitly passing --tokenizer with the full path.
Using only standard dashes (-) in all commands.
Thank you for your feedback, and sorry again for the inconvenience.