Vllm 0.10.1 v1 benchmark Only a part of the requests can be processed before it gets stuck

The vllm version is 0.10.1. The v0 benchmark can be successfully executed, but the v1 request will get stuck. When the client gets stuck, it will time out. The benchmark log is provided below.

Traceback (most recent call last):
  File "/opt/ac2/lib/python3.12/site-packages/aiohttp/client_reqrep.py", line 539, in start
    message, payload = await protocol.read()  # type: ignore[union-attr]
                       ^^^^^^^^^^^^^^^^^^^^^
  File "/opt/ac2/lib/python3.12/site-packages/aiohttp/streams.py", line 680, in read
    await self._waiter
asyncio.exceptions.CancelledError

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/mnt/ysgg1/vllm_split/scripts/benchmark/backend_request_func.py", line 188, in async_request_openai
    async with session.post(url=api_url, json=payload, headers=headers) as response:
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/ac2/lib/python3.12/site-packages/aiohttp/client.py", line 1517, in __aenter__
    self._resp: _RetType = await self._coro
                           ^^^^^^^^^^^^^^^^
  File "/opt/ac2/lib/python3.12/site-packages/aiohttp/client.py", line 786, in _request
    resp = await handler(req)
           ^^^^^^^^^^^^^^^^^^
  File "/opt/ac2/lib/python3.12/site-packages/aiohttp/client.py", line 764, in _connect_and_send_request
    await resp.start(conn)
  File "/opt/ac2/lib/python3.12/site-packages/aiohttp/client_reqrep.py", line 534, in start
    with self._timer:
         ^^^^^^^^^^^
  File "/opt/ac2/lib/python3.12/site-packages/aiohttp/helpers.py", line 713, in __exit__
    raise asyncio.TimeoutError from exc_val
TimeoutError

[5m54s < 40m35s] Progress: 127/1000 (1 failed) (12.7%), Decoding: 0, Prefilling: 38, TTFT: 0.00 (15779.53), ITL: 21.37 (21.37), Decoding throughput: 0.00 (273.82), Prefilling throughput: 0.00 (67.14)

The log of the scheduler is below.

finish serving request: deaec2cd-7a00-4d3c-9c6e-c41024ec6b85
connection of request: 289076dc-e996-4cbd-ad6e-8c5ea911e404, scheduler request: e8420178-1460-45d3-9ea0-3c85e27a2f75 closed without finish
finish prefill stage of request[abort] or some wrong with input parameter : e8420178-1460-45d3-9ea0-3c85e27a2f75
finish serving request: e8420178-1460-45d3-9ea0-3c85e27a2f75
http: proxy error: context canceled

The server’s log did not report any errors.

Your issue—vLLM v1 requests getting stuck and timing out during benchmarking, while v0 works—matches several known problems with vLLM’s async serving and concurrency. The symptoms (requests stuck, client timeout, “connection closed without finish”, “finish prefill stage of request[abort]”) are consistent with server-side hangs or deadlocks under load, especially with concurrent requests. This is a recurring issue in vLLM, often discussed in the context of v1 engine and async serving, and is not always accompanied by server-side error logs. Increasing timeouts or using streaming mode can sometimes help, but a robust fix may require updating to a newer vLLM version or adjusting server concurrency settings. See the discussion in vLLM issue #13422, #1492, and #1762.

For troubleshooting, try: (1) increasing client/server timeouts, (2) enabling streaming mode ("stream": true), (3) reducing concurrent request load, and (4) updating to the latest vLLM version, as many deadlock and async bugs have been fixed in recent releases. If the problem persists, enable debug logging as described in the official troubleshooting guide to gather more information. Would you like a step-by-step breakdown of these troubleshooting steps?

Sources:

Hint: Mention @RunLLM in the post for followups.