Qwen 2.5 Image inference error

mshah151187 · October 28, 2025, 7:39pm

Hi team,

I am using qwen2.5 for an image inference task. I have a text prompt and I am adding at the start of the prompt and pass this to Autotokenizer.apply_chat_messages. I build the image request and pass it to llm.generate. I am getting this error

RuntimeError: Expected there to be 1 prompt updates corresponding to 1 image items, but instead found 0 prompt updates! This is likely because you forgot to include input placeholder tokens (e.g., <image>, <|image_pad|>) in the prompt. If the model has a chat template, make sure you have applied it before calling LLM.generate.

I have few questions.

I thought Autotokenizer for this model would add right tokens, which I believe are <image_pad> <vision_start> and <vision_end>. qwen 2.5 expects these tokens and they are not prompt after going through apply_chat_messages. Are these missing tokens causing the issue above?
I dont want to hardcode these tokens for qwen 2.5 and instead I am looking for standard approach of formatting prompts. If I want to support another model then I dont want to be in situation where I would have revisit the code again and again with tokens specific to the model. What is the standard approach here?

RunLLM · October 28, 2025, 7:40pm

Yes, the error occurs because your prompt is missing the required image placeholder tokens (e.g., <|vision_start|><|image_pad|><|vision_end|>) that Qwen2.5 expects. Using AutoTokenizer.apply_chat_template (not apply_chat_messages) with the correct message format will automatically insert these tokens for you, as long as you use the model’s chat template and message structure. Hardcoding tokens is not recommended; instead, always use the model’s provided chat template and message format to ensure compatibility across models. See this example and docs for the standard approach.

In summary: always use the model’s chat template (via AutoProcessor or AutoTokenizer.apply_chat_template) and pass messages in the expected format (with image/text content as structured objects), so the correct tokens are inserted automatically. This avoids model-specific hardcoding and ensures future compatibility. Would you like a step-by-step code example for Qwen2.5 or for supporting multiple models?

Sources:

Hint: Mention @RunLLM in the post for followups.

Topic		Replies	Views
Speeding up vllm inference for Qwen2.5-VL General	23	7285	June 27, 2025
IndexError: list index out of range (Qwen/Qwen2.5-VL-3B-Instruct) General	1	646	September 17, 2025
Why is inference for Qwen 2.5 VL so slow when we send an image? General	5	1645	August 20, 2025
Multimodal inference guideline? General	59	2467	August 6, 2025
Issue: Unable to pass precomputed image embeddings to vLLM General	12	311	January 14, 2026

Qwen 2.5 Image inference error

Related topics