关于vllm接受base64图片数据时的大小限制问题

caseclose · July 1, 2025, 5:11am

环境描述

vllm version: 0.8.3
vllm-flash-attn version: 2.6.2

情况描述

我在使用vllm进行推理的时候，我输入的内容包含system/user prompt等对话历史messages, 以及两张图片，我的图片格式为base64格式，我发现当我输入的base64格式图片长度低于200的时候，vllm会hang住，无法返回我的请求，这是vllm本身的限制吗？

更多详细信息

具体情况的示例，如下图所示：
卡67在起vllm rollout的服务，卡0～5在训练，卡0向卡67进行通讯传入request，发现卡0以及67的利用率均为0，此时已经hang住了：

输入的request格式如下：
可以看到有一个图片<base64_image_length_104>长度为104，导致vllm hang住

{
  "request_number": 98,
  "timestamp": "2025-07-01 11:05:51",
  "request_size_mb": 13.800942420959473,
  "process_pid": 84128,
  "cuda_devices": "0,1,",
  "request_data": {
    "infer_requests": [
      {
        "messages": [
          {
            "role": "system",
            "content": "A conversation between User and Assistant. The user asks a question, and the Assistant solves it. T...nswer> answer here </answer>\n"
          },
          {
            "role": "user",
            "content": "In this UI screenshot, I want to perform the command 'open memo app'.\nPlease think step-by-step to convert this high-level command into a concrete UI action.\nIn the <think> s...box>\n"
          },
          {
            "role": "assistant",
            "content": "<think> I need to find and open the Memo app, which is not visible on the current screen. I'll need to swipe through the screens to find it.\n</think> <bbox>[1368, 404, 1496, 559] </bbox>"
          },
          {
            "role": "user",
            "content": "<image><image>\nI have provided you with two images:\n1. The first image is the complete original UI screenshot\n2. The second image is the cropped area based on the bounding ....at."
          }
        ],
        "images": [
          "<base64_image_length_5638616>",
          "<base64_image_length_32168>"
        ],
        "audios": [],
        "videos": [],
        "tools": null,
        "objects": {}
      },
      {
        "messages": [
          {
            "role": "system",
            "content": "A conversation between User and Assistant. The user asks a question, and the Assistant solves it. T...e </answer>\n"
          },
          {
            "role": "user",
            "content": "In this UI screenshot, I want to perform the command 'open memo app'.\nPlease think step-by-step to convert this high-level command into a concrete UI action...\n"
          },
          {
            "role": "assistant",
            "content": "<think> I need to find and open the memo app from the home screen. The memo app is typically represented by icons labeled 'Voice Memos' or 'Note'. I will look for the icon that best matches this description.\n</think> <bbox>[1596, 164, 1727, 283] </bbox>"
          },
          {
            "role": "user",
            "content": "<image><image>\nI have provided you with two images:\n1. The first image is the complete original UI screenshot\n2. The second image is the cropped area based on the bounding box [1596, 164, 1727, 283] ...\nPlease strictly follow the format."
          }
        ],
        "images": [
          "<base64_image_length_5638616>",
          "<base64_image_length_11324>"
        ],
        "audios": [],
        "videos": [],
        "tools": null,
        "objects": {}
      },
      {
        "messages": [
          {
            "role": "system",
            "content": "A conversation between User and Assistant. The user asks a question, and the Assistant solves it. The...er here </answer>\n"
          },
          {
            "role": "user",
            "content": "In this UI screenshot, I want to perform the command 'more settings'.\nPlease think step-by-step to convert this high-level command into a concrete UI action.\...\n"
          },
          {
            "role": "assistant",
            "content": "<think> 用户想要打开更多设置，这通常需要在应用的顶部或侧边导航栏中找到设置或更多选项图标。\n</think> <bbox>[1004, 1250, 1012, 1254] </bbox>"
          },
          {
            "role": "user",
            "content": "<image><image>\nI have provided you with two images:\n1. The first image is the complete original UI screenshot\n2. The second image is the cropped area based on the bounding box [1004, 1250, 1012, 1254] you identified in the first turn\n\..."
          }
        ],
        "images": [
          "<base64_image_length_1568512>",
          "<base64_image_length_104>"
        ],
        "audios": [],
        "videos": [],
        "tools": null,
        "objects": {}
      },
      {
        "messages": [
          {
            "role": "system",
            "content": "A conversation between User and Assistant. The user asks a question, and the Assistant solv.... here </answer>\n"
          },
          {
            "role": "user",
            "content": "In this UI screenshot, I want to perform the command 'more settings'.\nPlease think step-by-step to convert this high-level command into a concrete UI action.\n...\n"
          },
          {
            "role": "assistant",
            "content": "<think> To perform the 'more settings' command, I need to access the application's settings menu. T...</bbox>"
          },
          {
            "role": "user",
            "content": "<image><image>\nI have provided you with two images:\n1. The first image is the ...."
          }
        ],
        "images": [
          "<base64_image_length_1568512>",
          "<base64_image_length_692>"
        ],
        "audios": [],
        "videos": [],
        "tools": null,
        "objects": {}
      }
    ],
    "request_config": {
      "max_tokens": 1024,
      "temperature": 1.0,
      "top_k": 50,
      "top_p": 0.9,
      "repetition_penalty": 1.0,
      "num_beams": 1,
      "stop": [],
      "seed": null,
      "stream": false,
      "logprobs": false,
      "top_logprobs": null,
      "n": 1,
      "best_of": null,
      "presence_penalty": 0.0,
      "frequency_penalty": 0.0,
      "length_penalty": 1.0
    },
    "metrics": null,
    "template": null,
    "use_tqdm": false,
    "adapter_request": null
  }
}

RunLLM · July 1, 2025, 5:12am

根据你的描述和日志，vLLM在处理base64图片长度低于200时会hang住，无法返回请求。根据vllm官方issue #17769，当输入的图片数据不完整或过小（如base64长度很短，实际为损坏或截断图片），vLLM底层PIL库会抛出OSError: image file is truncated，导致推理流程卡住。官方建议在发送给vLLM前先对图片进行校验和预处理，确保图片数据完整，否则vLLM无法通过校验，进而hang住。

vLLM本身没有提供自动处理截断图片的机制，需在业务侧过滤或修复异常图片。你可以在推理前用PIL尝试打开图片，捕获异常后丢弃或修复，再传给vLLM。此问题并非vLLM的“限制”，而是输入数据未通过底层图片库校验导致的hang。
需要更详细的处理建议或代码示例吗？

Sources:

vllm官方issue #17769

Topic		Replies	Views
Help in image from vllm pov General	15	29	July 23, 2025
Current vLLM docker image size is 12.64Gb, how to reduce it? General	3	66	July 28, 2025
Multimodal inference guideline? General	59	577	August 6, 2025
Help needed with vLLM Qwen3-32B inference performance tuning (long input slow) General	1	505	May 26, 2025
Speeding up vllm inference for Qwen2.5-VL General	23	3576	June 27, 2025

关于vllm接受base64图片数据时的大小限制问题

环境描述

情况描述

更多详细信息

Related topics