Why is inference for Qwen 2.5 VL so slow when we send an image?

ywang96 · August 19, 2025, 8:43pm

Could you expand a bit more about “inference speed” here?

Image + text inference should generally be slower than text-only inference in terms of time-to-first-token due to the additional image processing and encoding in prefilling the request, but decoding performance should be identical.

Topic		Replies	Views
Speeding up vllm inference for Qwen2.5-VL General	23	7250	June 27, 2025
Problem with Qwen2.5-VL-7b General	3	1309	July 9, 2025
Is Batch Inference for Multimodal Models Truly Batch Inference? General	9	498	August 1, 2025
what is the recommended method to start up the vllm server engine for inferencing for InternVL3_5-8B, getting 2 qps? General	1	104	September 14, 2025
Qwen 2.5 VL for videos General	1	950	August 26, 2025

Why is inference for Qwen 2.5 VL so slow when we send an image?

Related topics