Hi,
I’m encountering an AttributeError: ‘str’ object has no attribute ‘content’ when trying to use the HuggingFaceTB/SmolVLM-500M-Instruct model with vLLM version 0.8.5.post1.
The error originates in vllm/model_executor/models/idefics3.py at the line where it attempts to access processor.image_token.content.
Investigation shows that for the HuggingFaceTB/SmolVLM-500M-Instruct model, AutoProcessor.from_pretrained(…).image_token returns a direct string (e.g., “”), not an object with a .content attribute.
Error Traceback Snippet:
ERROR ... [core.py:396] File ".../vllm/model_executor/models/idefics3.py", line 206, in _get_image_token
ERROR ... [core.py:396] image_token = processor.image_token.content
ERROR ... [core.py:396] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\nERROR ... [core.py:396] AttributeError: 'str' object has no attribute 'content'
Investigation Script Output: A simple script using transformers.AutoProcessor confirms this:
from transformers import AutoProcessor
model_name = "HuggingFaceTB/SmolVLM-500M-Instruct"
processor = AutoProcessor.from_pretrained(model_name, trust_remote_code=True)
print(f"Type of processor.image_token: {type(processor.image_token)}")
print(f"Value of processor.image_token: {processor.image_token}")
# This will show:
# Type of processor.image_token: <class 'str'>
# Value of processor.image_token: <image>
It seems vLLM’s Idefics3 model implementation expects processor.image_token to be an object (e.g., a SpecialToken instance) rather than a direct string. This causes an incompatibility with models like SmolVLM-500M-Instruct where the processor provides the image token as a simple string.
Could vLLM’s idefics3.py be updated to handle cases where processor.image_token is a string directly, perhaps by checking its type before attempting to access a .content attribute? For example:
if hasattr(processor.image_token, 'content'):
image_token_str = processor.image_token.content
elif isinstance(processor.image_token, str):
image_token_str = processor.image_token
else:
# Handle unexpected type
...
Any guidance or a potential fix would be greatly appreciated.
Thanks!