AttributeError: 'str' object has no attribute 'content' with HuggingFaceTB/SmolVLM-500M-Instruct and vLLM 0.8.5.post1

Hi,

I’m encountering an AttributeError: ‘str’ object has no attribute ‘content’ when trying to use the HuggingFaceTB/SmolVLM-500M-Instruct model with vLLM version 0.8.5.post1.

The error originates in vllm/model_executor/models/idefics3.py at the line where it attempts to access processor.image_token.content.

Investigation shows that for the HuggingFaceTB/SmolVLM-500M-Instruct model, AutoProcessor.from_pretrained(…).image_token returns a direct string (e.g., “”), not an object with a .content attribute.

Error Traceback Snippet:

ERROR ... [core.py:396]   File ".../vllm/model_executor/models/idefics3.py", line 206, in _get_image_token
ERROR ... [core.py:396]     image_token = processor.image_token.content
ERROR ... [core.py:396]                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\nERROR ... [core.py:396] AttributeError: 'str' object has no attribute 'content'

Investigation Script Output: A simple script using transformers.AutoProcessor confirms this:

from transformers import AutoProcessor
model_name = "HuggingFaceTB/SmolVLM-500M-Instruct"
processor = AutoProcessor.from_pretrained(model_name, trust_remote_code=True)
print(f"Type of processor.image_token: {type(processor.image_token)}")
print(f"Value of processor.image_token: {processor.image_token}")
# This will show:
# Type of processor.image_token: <class 'str'>
# Value of processor.image_token: <image>

It seems vLLM’s Idefics3 model implementation expects processor.image_token to be an object (e.g., a SpecialToken instance) rather than a direct string. This causes an incompatibility with models like SmolVLM-500M-Instruct where the processor provides the image token as a simple string.

Could vLLM’s idefics3.py be updated to handle cases where processor.image_token is a string directly, perhaps by checking its type before attempting to access a .content attribute? For example:

if hasattr(processor.image_token, 'content'):
    image_token_str = processor.image_token.content
elif isinstance(processor.image_token, str):
    image_token_str = processor.image_token
else:
    # Handle unexpected type
    ...

Any guidance or a potential fix would be greatly appreciated.

Thanks!

You are correct: for HuggingFaceTB/SmolVLM-500M-Instruct, processor.image_token is a string, not an object with a .content attribute. The vLLM Idefics3 implementation expects .content, which causes the AttributeError. This is a known issue, and the correct approach is to check the type of processor.image_token and use it directly if it’s a string, as you suggested. This fix has already been implemented for SmolVLM in vLLM by overriding the relevant method to handle both cases (see PR #16017).

If you are using a vLLM version before this fix, you will encounter this error. Upgrading to a version that includes PR #16017 or later will resolve the issue for SmolVLM models. If you need to patch locally, your proposed type check is correct and matches the fix merged upstream.

Would you like more detail on how to patch this in your local vLLM installation?

Sources: