Hi team,
I am using qwen2.5 for an image inference task. I have a text prompt and I am adding at the start of the prompt and pass this to Autotokenizer.apply_chat_messages. I build the image request and pass it to llm.generate. I am getting this error
RuntimeError: Expected there to be 1 prompt updates corresponding to 1 image items, but instead found 0 prompt updates! This is likely because you forgot to include input placeholder tokens (e.g., <image>, <|image_pad|>) in the prompt. If the model has a chat template, make sure you have applied it before calling LLM.generate.
I have few questions.
- I thought Autotokenizer for this model would add right tokens, which I believe are <image_pad> <vision_start> and <vision_end>. qwen 2.5 expects these tokens and they are not prompt after going through apply_chat_messages. Are these missing tokens causing the issue above?
- I dont want to hardcode these tokens for qwen 2.5 and instead I am looking for standard approach of formatting prompts. If I want to support another model then I dont want to be in situation where I would have revisit the code again and again with tokens specific to the model. What is the standard approach here?