Your command is not fully correct for benchmarking vision models in vLLM. The correct approach is to use a supported vision dataset (such as VisionArena) with the HuggingFace dataset interface, and set --dataset-name hf and --dataset-path lmarena-ai/VisionArena-Chat. The vision dataset name is not recognized by vLLM’s benchmark scripts. Also, for vision models, you typically use the openai-chat backend and the /v1/chat/completions endpoint.
README.md: 8.75kB [00:00, 39.1MB/s]
Resolving data files: 100%|████████████████████████████████████████████████████████████████████████████████████| 43/43 [00:00<00:00, 53.97it/s]
Resolving data files: 100%|████████████████████████████████████████████████████████████████████████████████| 43/43 [00:00<00:00, 173920.03it/s]
Starting initial single prompt test run…
Traceback (most recent call last):
File “/vllm-workspace/benchmarks/benchmark_serving.py”, line 1305, in
main(args)
File “/usr/local/lib/python3.12/dist-packages/typing_extensions.py”, line 2956, in wrapper
return arg(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^
File “/vllm-workspace/benchmarks/benchmark_serving.py”, line 814, in main
benchmark_result = asyncio.run(
^^^^^^^^^^^^
File “/usr/lib/python3.12/asyncio/runners.py”, line 195, in run
return runner.run(main)
^^^^^^^^^^^^^^^^
File “/usr/lib/python3.12/asyncio/runners.py”, line 118, in run
return self._loop.run_until_complete(task)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/usr/lib/python3.12/asyncio/base_events.py”, line 691, in run_until_complete
return future.result()
^^^^^^^^^^^^^^^
File “/vllm-workspace/benchmarks/benchmark_serving.py”, line 289, in benchmark
raise ValueError(
ValueError: Initial test run failed - Please make sure benchmark arguments are correctly specified. Error: Not Found
Fatal Python error: PyGILState_Release: thread state 0x7e5f641db630 must be current when releasing
Python runtime state: finalizing (tstate=0x0000000000b898f0)