How to give video input to Qwen-3VL with online serving

IdanBibi · November 5, 2025, 1:12pm

I have qwen3 vl which was served using vllm. I would like to give it video as input and control how many frames it parses. How can i do it?

RunLLM · November 5, 2025, 1:13pm

To control how many frames Qwen3-VL parses from a video in vLLM, you should set the number of frames at server startup using the --media-io-kwargs argument, e.g. --media-io-kwargs ‘{“video”: {“num_frames”: 40}}’. Alternatively, you can set the fps (frames per second) for sampling via --mm-processor-kwargs ‘{“fps”: 4}’ at startup. These options are not currently supported dynamically per request in online inference; they must be set when launching vllm serve. For more details, see this discussion and the documentation.

If you need per-request control, you must pre-sample frames externally before sending the video to vLLM, as runtime adjustment via API is not supported yet. Would you like more detail or code examples?

Sources:

Hint: Mention @RunLLM in the post for followups.

IdanBibi · November 5, 2025, 1:20pm

I get media-io-kwargs not exist when using vllm serve. I am using vllm==0.11.0

Topic		Replies	Views
Qwen 2.5 VL for videos General	1	806	August 26, 2025
调用vllm的python接口，推理Qwen3-VL模型 General	13	268	December 5, 2025
what is the recommended method to start up the vllm server engine for inferencing for InternVL3_5-8B, getting 2 qps? General	1	90	September 14, 2025
Vllm serve拉起推理服务报错了 General	9	110	December 15, 2025
Help needed with vLLM Qwen3-32B inference performance tuning (long input slow) General	1	1211	May 26, 2025

How to give video input to Qwen-3VL with online serving

Related topics