关于vllm接受base64图片数据时的大小限制问题

环境描述

vllm version: 0.8.3
vllm-flash-attn version: 2.6.2

情况描述

我在使用vllm进行推理的时候,我输入的内容包含system/user prompt等对话历史messages, 以及两张图片,我的图片格式为base64格式,我发现当我输入的base64格式图片长度低于200的时候,vllm会hang住,无法返回我的请求,这是vllm本身的限制吗?

更多详细信息

具体情况的示例,如下图所示:
卡67在起vllm rollout的服务,卡0~5在训练,卡0向卡67进行通讯传入request,发现卡0以及67的利用率均为0,此时已经hang住了:


输入的request格式如下:
可以看到有一个图片<base64_image_length_104>长度为104,导致vllm hang住

{
  "request_number": 98,
  "timestamp": "2025-07-01 11:05:51",
  "request_size_mb": 13.800942420959473,
  "process_pid": 84128,
  "cuda_devices": "0,1,",
  "request_data": {
    "infer_requests": [
      {
        "messages": [
          {
            "role": "system",
            "content": "A conversation between User and Assistant. The user asks a question, and the Assistant solves it. T...nswer> answer here </answer>\n"
          },
          {
            "role": "user",
            "content": "In this UI screenshot, I want to perform the command 'open memo app'.\nPlease think step-by-step to convert this high-level command into a concrete UI action.\nIn the <think> s...box>\n"
          },
          {
            "role": "assistant",
            "content": "<think> I need to find and open the Memo app, which is not visible on the current screen. I'll need to swipe through the screens to find it.\n</think> <bbox>[1368, 404, 1496, 559] </bbox>"
          },
          {
            "role": "user",
            "content": "<image><image>\nI have provided you with two images:\n1. The first image is the complete original UI screenshot\n2. The second image is the cropped area based on the bounding ....at."
          }
        ],
        "images": [
          "<base64_image_length_5638616>",
          "<base64_image_length_32168>"
        ],
        "audios": [],
        "videos": [],
        "tools": null,
        "objects": {}
      },
      {
        "messages": [
          {
            "role": "system",
            "content": "A conversation between User and Assistant. The user asks a question, and the Assistant solves it. T...e </answer>\n"
          },
          {
            "role": "user",
            "content": "In this UI screenshot, I want to perform the command 'open memo app'.\nPlease think step-by-step to convert this high-level command into a concrete UI action...\n"
          },
          {
            "role": "assistant",
            "content": "<think> I need to find and open the memo app from the home screen. The memo app is typically represented by icons labeled 'Voice Memos' or 'Note'. I will look for the icon that best matches this description.\n</think> <bbox>[1596, 164, 1727, 283] </bbox>"
          },
          {
            "role": "user",
            "content": "<image><image>\nI have provided you with two images:\n1. The first image is the complete original UI screenshot\n2. The second image is the cropped area based on the bounding box [1596, 164, 1727, 283] ...\nPlease strictly follow the format."
          }
        ],
        "images": [
          "<base64_image_length_5638616>",
          "<base64_image_length_11324>"
        ],
        "audios": [],
        "videos": [],
        "tools": null,
        "objects": {}
      },
      {
        "messages": [
          {
            "role": "system",
            "content": "A conversation between User and Assistant. The user asks a question, and the Assistant solves it. The...er here </answer>\n"
          },
          {
            "role": "user",
            "content": "In this UI screenshot, I want to perform the command 'more settings'.\nPlease think step-by-step to convert this high-level command into a concrete UI action.\...\n"
          },
          {
            "role": "assistant",
            "content": "<think> 用户想要打开更多设置,这通常需要在应用的顶部或侧边导航栏中找到设置或更多选项图标。\n</think> <bbox>[1004, 1250, 1012, 1254] </bbox>"
          },
          {
            "role": "user",
            "content": "<image><image>\nI have provided you with two images:\n1. The first image is the complete original UI screenshot\n2. The second image is the cropped area based on the bounding box [1004, 1250, 1012, 1254] you identified in the first turn\n\..."
          }
        ],
        "images": [
          "<base64_image_length_1568512>",
          "<base64_image_length_104>"
        ],
        "audios": [],
        "videos": [],
        "tools": null,
        "objects": {}
      },
      {
        "messages": [
          {
            "role": "system",
            "content": "A conversation between User and Assistant. The user asks a question, and the Assistant solv.... here </answer>\n"
          },
          {
            "role": "user",
            "content": "In this UI screenshot, I want to perform the command 'more settings'.\nPlease think step-by-step to convert this high-level command into a concrete UI action.\n...\n"
          },
          {
            "role": "assistant",
            "content": "<think> To perform the 'more settings' command, I need to access the application's settings menu. T...</bbox>"
          },
          {
            "role": "user",
            "content": "<image><image>\nI have provided you with two images:\n1. The first image is the ...."
          }
        ],
        "images": [
          "<base64_image_length_1568512>",
          "<base64_image_length_692>"
        ],
        "audios": [],
        "videos": [],
        "tools": null,
        "objects": {}
      }
    ],
    "request_config": {
      "max_tokens": 1024,
      "temperature": 1.0,
      "top_k": 50,
      "top_p": 0.9,
      "repetition_penalty": 1.0,
      "num_beams": 1,
      "stop": [],
      "seed": null,
      "stream": false,
      "logprobs": false,
      "top_logprobs": null,
      "n": 1,
      "best_of": null,
      "presence_penalty": 0.0,
      "frequency_penalty": 0.0,
      "length_penalty": 1.0
    },
    "metrics": null,
    "template": null,
    "use_tqdm": false,
    "adapter_request": null
  }
}

根据你的描述和日志,vLLM在处理base64图片长度低于200时会hang住,无法返回请求。根据vllm官方issue #17769,当输入的图片数据不完整或过小(如base64长度很短,实际为损坏或截断图片),vLLM底层PIL库会抛出OSError: image file is truncated,导致推理流程卡住。官方建议在发送给vLLM前先对图片进行校验和预处理,确保图片数据完整,否则vLLM无法通过校验,进而hang住。

vLLM本身没有提供自动处理截断图片的机制,需在业务侧过滤或修复异常图片。你可以在推理前用PIL尝试打开图片,捕获异常后丢弃或修复,再传给vLLM。此问题并非vLLM的“限制”,而是输入数据未通过底层图片库校验导致的hang。
需要更详细的处理建议或代码示例吗?

Sources: