Qwen 2.5 Image inference error

Hi team,

I am using qwen2.5 for an image inference task. I have a text prompt and I am adding at the start of the prompt and pass this to Autotokenizer.apply_chat_messages. I build the image request and pass it to llm.generate. I am getting this error

RuntimeError: Expected there to be 1 prompt updates corresponding to 1 image items, but instead found 0 prompt updates! This is likely because you forgot to include input placeholder tokens (e.g., <image>, <|image_pad|>) in the prompt. If the model has a chat template, make sure you have applied it before calling LLM.generate.

I have few questions.

  1. I thought Autotokenizer for this model would add right tokens, which I believe are <image_pad> <vision_start> and <vision_end>. qwen 2.5 expects these tokens and they are not prompt after going through apply_chat_messages. Are these missing tokens causing the issue above?
  2. I dont want to hardcode these tokens for qwen 2.5 and instead I am looking for standard approach of formatting prompts. If I want to support another model then I dont want to be in situation where I would have revisit the code again and again with tokens specific to the model. What is the standard approach here?

Yes, the error occurs because your prompt is missing the required image placeholder tokens (e.g., <|vision_start|><|image_pad|><|vision_end|>) that Qwen2.5 expects. Using AutoTokenizer.apply_chat_template (not apply_chat_messages) with the correct message format will automatically insert these tokens for you, as long as you use the model’s chat template and message structure. Hardcoding tokens is not recommended; instead, always use the model’s provided chat template and message format to ensure compatibility across models. See this example and docs for the standard approach.

In summary: always use the model’s chat template (via AutoProcessor or AutoTokenizer.apply_chat_template) and pass messages in the expected format (with image/text content as structured objects), so the correct tokens are inserted automatically. This avoids model-specific hardcoding and ensures future compatibility. Would you like a step-by-step code example for Qwen2.5 or for supporting multiple models?

Sources:

Hint: Mention @RunLLM in the post for followups.