多机多卡推理 ray vllm遇到的报错

2卡16节点(每个节点8张卡),启动Qwen3-235B-A22B的推理,

集群为:

root@root-1-241:/data/llm/app/Qwen3-235B-A22B# ray status
======== Autoscaler status: 2026-01-23 15:03:17.631346 ========
Node status
---------------------------------------------------------------
Active:
 1 node_60b782f567508b2a8c54c54cb54bb6f6c4e68e62cbe366664003069e
 1 node_11657f380647113b616b9b934e410a3daab898575243e4c96a7a02cf
Pending:
 (no pending nodes)
Recent failures:
 (no failures)

Resources
---------------------------------------------------------------
Total Usage:
 0.0/256.0 CPU
 0.0/16.0 GPU
 0B/1.78TiB memory
 0B/190.00GiB object_store_memory

Total Constraints:
 (no request_resources() constraints)
Total Demands:
 (no resource demands)
root@root-1-241:/data/llm/app/Qwen3-235B-A22B#

服务命令为:

nohup vllm serve /data/llm/models/Qwen3-235B-A22B -pp 2 -tp 8 --trust-remote-code --distributed-executor-backend ray --dtype bfloat16 --swap-space 16 --gpu-memory-utilization 0.8 --host 0.0.0.0 --port 11235 --max-model-len 20000 --reasoning-parser qwen3 --tool-call-parser hermes --enable-auto-tool-choice > vllm-$(date +%Y%m%d%H%M).log 2>&1 &

遇到报错,如下:

(EngineCore_DP0 pid=9433) ERROR 01-23 14:46:01 [core.py:842] EngineCore failed to start.
(EngineCore_DP0 pid=9433) ERROR 01-23 14:46:01 [core.py:842] Traceback (most recent call last):
(EngineCore_DP0 pid=9433) ERROR 01-23 14:46:01 [core.py:842] File “/opt/conda/lib/python3.12/site-packages/vllm/v1/engine/core.py”, line 833, in run_engine_core
(EngineCore_DP0 pid=9433) ERROR 01-23 14:46:01 [core.py:842] engine_core = EngineCoreProc(*args, **kwargs)
(EngineCore_DP0 pid=9433) ERROR 01-23 14:46:01 [core.py:842] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=9433) ERROR 01-23 14:46:01 [core.py:842] File “/opt/conda/lib/python3.12/site-packages/vllm/v1/engine/core.py”, line 606, in _init_
(EngineCore_DP0 pid=9433) ERROR 01-23 14:46:01 [core.py:842] super()._init_(
(EngineCore_DP0 pid=9433) ERROR 01-23 14:46:01 [core.py:842] File “/opt/conda/lib/python3.12/site-packages/vllm/v1/engine/core.py”, line 102, in _init_
(EngineCore_DP0 pid=9433) ERROR 01-23 14:46:01 [core.py:842] self.model_executor = executor_class(vllm_config)
(EngineCore_DP0 pid=9433) ERROR 01-23 14:46:01 [core.py:842] ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=9433) ERROR 01-23 14:46:01 [core.py:842] File “/opt/conda/lib/python3.12/site-packages/vllm/v1/executor/abstract.py”, line 101, in _init_
(EngineCore_DP0 pid=9433) ERROR 01-23 14:46:01 [core.py:842] self._init_executor()
(EngineCore_DP0 pid=9433) ERROR 01-23 14:46:01 [core.py:842] File “/opt/conda/lib/python3.12/site-packages/vllm/v1/executor/ray_executor.py”, line 97, in _init_executor
(EngineCore_DP0 pid=9433) ERROR 01-23 14:46:01 [core.py:842] self._init_workers_ray(placement_group)
(EngineCore_DP0 pid=9433) ERROR 01-23 14:46:01 [core.py:842] File “/opt/conda/lib/python3.12/site-packages/vllm/v1/executor/ray_executor.py”, line 370, in _init_workers_ray
(EngineCore_DP0 pid=9433) ERROR 01-23 14:46:01 [core.py:842] self.collective_rpc(“init_device”)
(EngineCore_DP0 pid=9433) ERROR 01-23 14:46:01 [core.py:842] File “/opt/conda/lib/python3.12/site-packages/vllm/v1/executor/ray_executor.py”, line 493, in collective_rpc
(EngineCore_DP0 pid=9433) ERROR 01-23 14:46:01 [core.py:842] return ray.get(ray_worker_outputs, timeout=timeout)
(EngineCore_DP0 pid=9433) ERROR 01-23 14:46:01 [core.py:842] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=9433) ERROR 01-23 14:46:01 [core.py:842] File “/opt/conda/lib/python3.12/site-packages/ray/_private/auto_init_hook.py”, line 22, in auto_init_wrapper
(EngineCore_DP0 pid=9433) ERROR 01-23 14:46:01 [core.py:842] return fn(*args, **kwargs)
(EngineCore_DP0 pid=9433) ERROR 01-23 14:46:01 [core.py:842] ^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=9433) ERROR 01-23 14:46:01 [core.py:842] File “/opt/conda/lib/python3.12/site-packages/ray/_private/client_mode_hook.py”, line 104, in wrapper
(EngineCore_DP0 pid=9433) ERROR 01-23 14:46:01 [core.py:842] return func(*args, **kwargs)
(EngineCore_DP0 pid=9433) ERROR 01-23 14:46:01 [core.py:842] ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=9433) ERROR 01-23 14:46:01 [core.py:842] File “/opt/conda/lib/python3.12/site-packages/ray/_private/worker.py”, line 2858, in get
(EngineCore_DP0 pid=9433) ERROR 01-23 14:46:01 [core.py:842] values, debugger_breakpoint = worker.get_objects(object_refs, timeout=timeout)
(EngineCore_DP0 pid=9433) ERROR 01-23 14:46:01 [core.py:842] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=9433) ERROR 01-23 14:46:01 [core.py:842] File “/opt/conda/lib/python3.12/site-packages/ray/_private/worker.py”, line 958, in get_objects
(EngineCore_DP0 pid=9433) ERROR 01-23 14:46:01 [core.py:842] raise value.as_instanceof_cause()
(EngineCore_DP0 pid=9433) ERROR 01-23 14:46:01 [core.py:842] ray.exceptions.RayTaskError(AcceleratorError): ray::RayWorkerWrapper.execute_method() (pid=974, ip=192.168.205.2
5, actor_id=d3849d9ac13f4515a54d8f8502000000, repr=<vllm.v1.executor.ray_utils.RayWorkerWrapper object at 0x7faa41db23f0>)
(EngineCore_DP0 pid=9433) ERROR 01-23 14:46:01 [core.py:842] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=9433) ERROR 01-23 14:46:01 [core.py:842] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=9433) ERROR 01-23 14:46:01 [core.py:842] File “/opt/conda/lib/python3.12/site-packages/vllm/v1/worker/worker_base.py”, line 343, in execute_method
(EngineCore_DP0 pid=9433) ERROR 01-23 14:46:01 [core.py:842] raise e
(EngineCore_DP0 pid=9433) ERROR 01-23 14:46:01 [core.py:842] File “/opt/conda/lib/python3.12/site-packages/vllm/v1/worker/worker_base.py”, line 332, in execute_method
(EngineCore_DP0 pid=9433) ERROR 01-23 14:46:01 [core.py:842] return run_method(self, method, args, kwargs)
(EngineCore_DP0 pid=9433) ERROR 01-23 14:46:01 [core.py:842] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=9433) ERROR 01-23 14:46:01 [core.py:842] File “/opt/conda/lib/python3.12/site-packages/vllm/v1/serial_utils.py”, line 479, in run_method
(EngineCore_DP0 pid=9433) ERROR 01-23 14:46:01 [core.py:842] return func(*args, **kwargs)
(EngineCore_DP0 pid=9433) ERROR 01-23 14:46:01 [core.py:842] ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=9433) ERROR 01-23 14:46:01 [core.py:842] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=9433) ERROR 01-23 14:46:01 [core.py:842] File “/opt/conda/lib/python3.12/site-packages/vllm/v1/worker/worker_base.py”, line 324, in init_device
(EngineCore_DP0 pid=9433) ERROR 01-23 14:46:01 [core.py:842] self.worker.init_device() # type: ignore
(EngineCore_DP0 pid=9433) ERROR 01-23 14:46:01 [core.py:842] ^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=9433) ERROR 01-23 14:46:01 [core.py:842] File “/opt/conda/lib/python3.12/site-packages/vllm/v1/worker/gpu_worker.py”, line 216, in init_device
(EngineCore_DP0 pid=9433) ERROR 01-23 14:46:01 [core.py:842] current_platform.set_device(self.device)
(EngineCore_DP0 pid=9433) ERROR 01-23 14:46:01 [core.py:842] File “/opt/conda/lib/python3.12/site-packages/vllm_metax/platform.py”, line 148, in set_device
(EngineCore_DP0 pid=9433) ERROR 01-23 14:46:01 [core.py:842] torch.cuda.set_device(device)
(EngineCore_DP0 pid=9433) ERROR 01-23 14:46:01 [core.py:842] File “/opt/conda/lib/python3.12/site-packages/torch/cuda/_init_.py”, line 570, in set_device
(EngineCore_DP0 pid=9433) ERROR 01-23 14:46:01 [core.py:842] torch._C._cuda_setDevice(device)
(EngineCore_DP0 pid=9433) ERROR 01-23 14:46:01 [core.py:842] torch.AcceleratorError: CUDA error: invalid device ordinal
(EngineCore_DP0 pid=9433) ERROR 01-23 14:46:01 [core.py:842] CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
(EngineCore_DP0 pid=9433) ERROR 01-23 14:46:01 [core.py:842] For debugging consider passing CUDA_LAUNCH_BLOCKING=1
(EngineCore_DP0 pid=9433) ERROR 01-23 14:46:01 [core.py:842] Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

你的报错“CUDA error: invalid device ordinal”说明vLLM在分配GPU时找不到对应编号的设备,常见于Ray集群多节点/多卡环境下设备分配不一致或资源未正确暴露。根据官方文档相关issue,常见原因有:

  1. Ray集群节点的GPU数量与vLLM参数(如–tensor-parallel-size、–pipeline-parallel-size)不匹配,或某节点未正确分配GPU资源。
  2. 各节点的CUDA_VISIBLE_DEVICES未正确设置,或docker未加–gpus all参数,导致vLLM进程无法访问全部GPU。
  3. Ray的placement group未能正确分配GPU,或当前节点没有可用GPU,详见源码实现API文档

建议:

  • 检查每个节点nvidia-smi,确认8卡都可用,且Ray status显示所有GPU都已注册。
  • 确认docker run加了–gpus all --ipc=host --shm-size=16G等参数。
  • 检查CUDA_VISIBLE_DEVICES环境变量,确保每个节点都暴露了全部GPU。
  • 若用Ray,建议每个节点都用相同的vllm serve命令,且–tensor-parallel-size与–pipeline-parallel-size参数总和等于集群总GPU数。
  • 可尝试减少并行参数(如tp/pp),排查是否为资源分配问题。

需要更详细排查步骤吗?

Sources:

Hint: Mention @RunLLM in the post for followups.