I have qwen3 vl which was served using vllm. I would like to give it video as input and control how many frames it parses. How can i do it?
To control how many frames Qwen3-VL parses from a video in vLLM, you should set the number of frames at server startup using the --media-io-kwargs argument, e.g. --media-io-kwargs ‘{“video”: {“num_frames”: 40}}’. Alternatively, you can set the fps (frames per second) for sampling via --mm-processor-kwargs ‘{“fps”: 4}’ at startup. These options are not currently supported dynamically per request in online inference; they must be set when launching vllm serve. For more details, see this discussion and the documentation.
If you need per-request control, you must pre-sample frames externally before sending the video to vLLM, as runtime adjustment via API is not supported yet. Would you like more detail or code examples?
Sources:
Hint: Mention @RunLLM in the post for followups.
I get media-io-kwargs not exist when using vllm serve. I am using vllm==0.11.0