INFO: 127.0.0.1:38806 - “POST /v1/chat/completions HTTP/1.1” 200 OK
INFO: 127.0.0.1:38798 - “POST /v1/chat/completions HTTP/1.1” 200 OK
INFO: 127.0.0.1:38792 - “POST /v1/chat/completions HTTP/1.1” 200 OK
INFO: 127.0.0.1:38884 - “POST /v1/chat/completions HTTP/1.1” 200 OK
INFO: 127.0.0.1:38888 - “POST /v1/chat/completions HTTP/1.1” 200 OK
INFO: 127.0.0.1:38898 - “POST /v1/chat/completions HTTP/1.1” 200 OK
INFO: 127.0.0.1:38910 - “POST /v1/chat/completions HTTP/1.1” 200 OK
INFO: 127.0.0.1:38926 - “POST /v1/chat/completions HTTP/1.1” 200 OK
2025-11-27 19:37:41,761 INFO torch_tensor_accelerator_channel.py:807 – Creating communicator group 7059b3b8-b0e7-4632-9e89-1df7f2c4f60e on actors: [Actor(RayWorkerWrapper, 6066223b8e11fc63c315c5b502000000), Actor(RayWorkerWrapper, 55c0e21802ee932358ac4bf402000000), Actor(RayWorkerWrapper, 75cc638be6642954e5a1127a02000000), Actor(RayWorkerWrapper, e0b2c5427d125b5afc210a9402000000), Actor(RayWorkerWrapper, 29fad90f872c60b2efe0829802000000), Actor(RayWorkerWrapper, 9d2603edd05ac9bcda44a9b802000000), Actor(RayWorkerWrapper, d5f8a510f153cea2d327a19502000000), Actor(RayWorkerWrapper, 71926a937ac20129104d9af102000000), Actor(RayWorkerWrapper, efd1b81bb423b64932bd924202000000), Actor(RayWorkerWrapper, 1f9430738a77b398b7abb60302000000), Actor(RayWorkerWrapper, 30cd8580da7770b8737cfe3102000000), Actor(RayWorkerWrapper, af4afb2f0399336d73a5831902000000), Actor(RayWorkerWrapper, 85a38505bda9537ed1a2599a02000000), Actor(RayWorkerWrapper, 409192a91f249ad940ce1dcd02000000), Actor(RayWorkerWrapper, 84cdeb4f08f03f560ae82a0802000000), Actor(RayWorkerWrapper, ed86d327f4e2eef9b6fba7a802000000)]
2025-11-27 19:37:43,264 INFO torch_tensor_accelerator_channel.py:833 – Communicator group initialized.
Your logs show successful POST requests to /v1/chat/completions (HTTP 200 OK) and the initialization of a Ray communicator group for distributed inference. This indicates that vLLM is correctly handling concurrent chat completions and has set up distributed communication across multiple Ray worker actors, which is expected in a multi-GPU or multi-node deployment using Ray. There are no errors or warnings in the provided output—everything appears to be functioning as intended according to vLLM’s distributed serving design and Ray integration requirements (parallelism scaling docs, Ray communicator API).
If you are experiencing issues not shown in these logs, please specify the symptoms or errors. Would you like more detail on what these log messages mean or how to interpret distributed vLLM startup and request handling?
Sources:
- https://docs.vllm.ai/en/latest/serving/parallelism_scaling/#__span-2-2
- https://docs.vllm.ai/en/latest/api/vllm/distributed/device_communicators/ray_communicator/
Hint: Mention @RunLLM in the post for followups.
@RunLLM 然后就报下面这个错误
e[36m(RayWorkerWrapper pid=53515)e[0m INFO 11-27 19:36:50 [fused_moe.py:188] Initializing moe_cache_singleton shape: torch.Size([939524096]), memory: 1792.00 MBe[32m [repeated 15x across cluster]e[0m
e[36m(RayWorkerWrapper pid=53463)e[0m [lightop] hipModuleLoad: /usr/local/lib/python3.10/dist-packages/lightop/hsa/gfx936/gemm_w8a8_smooth_128x128x64_TN_BF16.co GetFunction: Cijk_Alik_Bljk_I8II_BH_MT128x128x64_TT8x8_WG16x16_Squant Successe[32m [repeated 44x across cluster]e[0m
e[36m(RayWorkerWrapper pid=9359, ip=28.58.225.233)e[0m INFO 11-27 19:36:51 [gpu_worker.py:239] Available KV cache memory: 29.15 GiBe[32m [repeated 15x across cluster]e[0m
ERROR 11-27 19:42:43 [core.py:599] EngineCore encountered a fatal error.
ERROR 11-27 19:42:43 [core.py:599] Traceback (most recent call last):
ERROR 11-27 19:42:43 [core.py:599] File “/usr/local/lib/python3.10/dist-packages/ray/dag/compiled_dag_node.py”, line 2529, in _execute_until
ERROR 11-27 19:42:43 [core.py:599] result = self._dag_output_fetcher.read(timeout)
ERROR 11-27 19:42:43 [core.py:599] File “/usr/local/lib/python3.10/dist-packages/ray/experimental/channel/common.py”, line 312, in read
ERROR 11-27 19:42:43 [core.py:599] outputs = self._read_list(timeout)
ERROR 11-27 19:42:43 [core.py:599] File “/usr/local/lib/python3.10/dist-packages/ray/experimental/channel/common.py”, line 403, in _read_list
ERROR 11-27 19:42:43 [core.py:599] raise e
ERROR 11-27 19:42:43 [core.py:599] File “/usr/local/lib/python3.10/dist-packages/ray/experimental/channel/common.py”, line 385, in _read_list
ERROR 11-27 19:42:43 [core.py:599] result = c.read(min(remaining_timeout, iteration_timeout))
ERROR 11-27 19:42:43 [core.py:599] File “/usr/local/lib/python3.10/dist-packages/ray/experimental/channel/shared_memory_channel.py”, line 776, in read
ERROR 11-27 19:42:43 [core.py:599] return self._channel_dict[self._resolve_actor_id()].read(timeout)
ERROR 11-27 19:42:43 [core.py:599] File “/usr/local/lib/python3.10/dist-packages/ray/experimental/channel/shared_memory_channel.py”, line 480, in read
ERROR 11-27 19:42:43 [core.py:599] ret = self._worker.get_objects(
ERROR 11-27 19:42:43 [core.py:599] File “/usr/local/lib/python3.10/dist-packages/ray/_private/worker.py”, line 932, in get_objects
ERROR 11-27 19:42:43 [core.py:599] ] = self.core_worker.get_objects(
ERROR 11-27 19:42:43 [core.py:599] File “python/ray/_raylet.pyx”, line 3194, in ray._raylet.CoreWorker.get_objects
ERROR 11-27 19:42:43 [core.py:599] File “python/ray/includes/common.pxi”, line 106, in ray._raylet.check_status
ERROR 11-27 19:42:43 [core.py:599] ray.exceptions.RayChannelTimeoutError: System error: Timed out waiting for object available to read. ObjectID: 00432c3dca111bd2391fd3dbc3bd2d5cb92928a30200000002e1f505
ERROR 11-27 19:42:43 [core.py:599]
ERROR 11-27 19:42:43 [core.py:599] The above exception was the direct cause of the following exception:
你的日志显示,Ray 分布式推理初始化正常,但随后 EngineCore 报错:ray.exceptions.RayChannelTimeoutError: System error: Timed out waiting for object available to read. 这表示 Ray 的分布式 DAG 通信在等待数据时超时,常见于多节点/多卡高并发或大模型推理时,Ray 通道未能及时收到数据,可能是通信瓶颈、资源分配不均或部分 worker 崩溃导致。相关文档和issue均有类似报错。
建议:
- 增大环境变量
RAY_CGRAPH_get_timeout(如export RAY_CGRAPH_get_timeout=300),避免默认 10 秒超时; - 检查所有节点间 GPU 通信(如 NCCL 配置、网络带宽、
NCCL_SOCKET_IFNAME); - 确认 Ray 集群资源分配充足,所有 worker 正常运行;
- 若使用 pipeline-parallel,建议先用较小的并行度测试,逐步增大;
- 参考官方分布式排障文档和相关 issue 讨论。
需要更详细的排查步骤或配置建议吗?
Sources:
Hint: Mention @RunLLM in the post for followups.
@RunLLM 用了很小的请求, 而且把 RAY_CGRAPH_get_timeout 设置成了300,也还有问题
还有哪些排查方向?
即使请求很小且已将 RAY_CGRAPH_get_timeout 设置为 300,仍然出现 RayChannelTimeoutError,说明问题可能不在超时本身,而在于分布式通信、资源分配或 worker 崩溃等底层原因。建议排查方向如下:
- 检查所有节点的 GPU 通信是否正常(如 NCCL 配置、网络连通性、
NCCL_SOCKET_IFNAME设置),可用官方排障脚本和 sanity check 脚本验证; - 确认 Ray 集群所有节点和 worker 状态正常(
ray status),无节点掉线或资源不足; - 检查日志中是否有 OOM、worker 崩溃、NCCL 错误等其他异常(如 GPU/CPU 内存不足、驱动不兼容等);
- 尝试降低并行度(如减少 tensor/pipeline parallel size),排除并行度过高导致的资源竞争或死锁;
- 确认所有节点的模型路径、环境变量、依赖版本完全一致,避免因环境不一致导致通信失败。
需要更详细的排查步骤或具体命令吗?
Sources:
Hint: Mention @RunLLM in the post for followups.