How to benchmark vision

How to benchmark vision, is this correct:
Basic vision benchmark

python benchmarks/benchmark_serving.py
–backend vllm
–model RedHatAI/gemma-3-12b-it-FP8-dynamic
–dataset-name vision
–dataset-path /path/to/vision/dataset.json
–num-prompts 100
–request-rate 10
–host localhost
–port 8000

Your command is not fully correct for benchmarking vision models in vLLM. The correct approach is to use a supported vision dataset (such as VisionArena) with the HuggingFace dataset interface, and set --dataset-name hf and --dataset-path lmarena-ai/VisionArena-Chat. The vision dataset name is not recognized by vLLM’s benchmark scripts. Also, for vision models, you typically use the openai-chat backend and the /v1/chat/completions endpoint.

Example command:

python benchmarks/benchmark_serving.py \
  --backend openai-chat \
  --model RedHatAI/gemma-3-12b-it-FP8-dynamic \
  --endpoint /v1/chat/completions \
  --dataset-name hf \
  --dataset-path lmarena-ai/VisionArena-Chat \
  --hf-split train \
  --num-prompts 100 \
  --request-rate 10 \
  --host localhost \
  --port 8000

See VisionArena Benchmark example, official recipes, and benchmark documentation for more details.

Would you like a step-by-step explanation or more details on dataset preparation?

Sources:

Hint: Mention @RunLLM in the post for followups.

that did not work out:

README.md: 8.75kB [00:00, 39.1MB/s]
Resolving data files: 100%|████████████████████████████████████████████████████████████████████████████████████| 43/43 [00:00<00:00, 53.97it/s]
Resolving data files: 100%|████████████████████████████████████████████████████████████████████████████████| 43/43 [00:00<00:00, 173920.03it/s]
Starting initial single prompt test run…
Traceback (most recent call last):
File “/vllm-workspace/benchmarks/benchmark_serving.py”, line 1305, in
main(args)
File “/usr/local/lib/python3.12/dist-packages/typing_extensions.py”, line 2956, in wrapper
return arg(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^
File “/vllm-workspace/benchmarks/benchmark_serving.py”, line 814, in main
benchmark_result = asyncio.run(
^^^^^^^^^^^^
File “/usr/lib/python3.12/asyncio/runners.py”, line 195, in run
return runner.run(main)
^^^^^^^^^^^^^^^^
File “/usr/lib/python3.12/asyncio/runners.py”, line 118, in run
return self._loop.run_until_complete(task)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/usr/lib/python3.12/asyncio/base_events.py”, line 691, in run_until_complete
return future.result()
^^^^^^^^^^^^^^^
File “/vllm-workspace/benchmarks/benchmark_serving.py”, line 289, in benchmark
raise ValueError(
ValueError: Initial test run failed - Please make sure benchmark arguments are correctly specified. Error: Not Found
Fatal Python error: PyGILState_Release: thread state 0x7e5f641db630 must be current when releasing
Python runtime state: finalizing (tstate=0x0000000000b898f0)

Thread 0x00007e627e0a2000 (most recent call first):

Extension modules: sentencepiece._sentencepiece, scipy.io.matlab._mio_utils, scipy.io.matlab._streams, scipy.io.matlab._mio5_utils, PIL._imagingmath (total: 5)