I’ve been attempting to connect a vLLM engine (as part of KubeAI) to a Ray Cluster (deployed by Kuberay) and have not had much success. For some reason it is unable to generate the file node_ip_address.json. I can confirm that if I run ray status
in the vLLM engine pod I see exactly the same output as I can see in the Ray cluster head pod, so vLLM is able to communicate with ray. These are the logs from vLLM.
2025-04-30 17:31:15,749 INFO worker.py:1514 -- Using address ray-cluster-kuberay-head-svc.kuberay.svc.cluster.local:6379 set in the environment variable RAY_ADDRESS
2025-04-30 17:31:15,749 INFO worker.py:1654 -- Connecting to existing Ray cluster at address: ray-cluster-kuberay-head-svc.kuberay.svc.cluster.local:6379...
2025-04-30 17:31:16,766 INFO node.py:1084 -- Can't find a `node_ip_address.json` file from /tmp/ray/session_2025-04-29_22-14-32_731655_1. Have you started Ray instance using `ray start` or `ray.init`?
2025-04-30 17:31:26,771 INFO node.py:1084 -- Can't find a `node_ip_address.json` file from /tmp/ray/session_2025-04-29_22-14-32_731655_1. Have you started Ray instance using `ray start` or `ray.init`?
Executing a health check from the vLLM engine pod returns an exit code of 0, which means the ray cluster health is allegedly ok.
ray health-check --address ray-cluster-kuberay-head-svc.kuberay.svc.cluster.local:6379
Has anyone seen the same behaviour before but successfully connected vLLM to an external ray cluster?
Engine Config:
args:
- --dtype=bfloat16
- --tensor-parallel-size=2
- --pipeline-parallel-size=2
- --no-enable-prefix-caching
- --gpu-memory-utilization=0.95
- --distributed-executor-backend=ray
- --max-model-len=65536
engine: VLLM
env:
RAY_ADDRESS: ray-cluster-kuberay-head-svc.kuberay.svc.cluster.local:6379
Versions:
- vLLM - 0.8.5, 0.8.2
- Ray - 2.43.0-py312
Platform:
- AKS (v1.30.9)
Stack Trace:
Traceback (most recent call last):
File "/usr/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/usr/lib/python3.12/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 400, in run_engine_core
raise e
File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 387, in run_engine_core
engine_core = EngineCoreProc(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 329, in __init__
super().__init__(vllm_config, executor_class, log_stats,
File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 64, in __init__
self.model_executor = executor_class(vllm_config)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/executor/executor_base.py", line 286, in __init__
super().__init__(*args, **kwargs)
File "/usr/local/lib/python3.12/dist-packages/vllm/executor/executor_base.py", line 52, in __init__
self._init_executor()
File "/usr/local/lib/python3.12/dist-packages/vllm/executor/ray_distributed_executor.py", line 105, in _init_executor
initialize_ray_cluster(self.parallel_config)
File "/usr/local/lib/python3.12/dist-packages/vllm/executor/ray_utils.py", line 299, in initialize_ray_cluster
ray.init(address=ray_address)
File "/usr/local/lib/python3.12/dist-packages/ray/_private/client_mode_hook.py", line 103, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/ray/_private/worker.py", line 1797, in init
_global_node = ray._private.node.Node(
^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/ray/_private/node.py", line 204, in __init__
node_ip_address = self._wait_and_get_for_node_address()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/ray/_private/node.py", line 1091, in _wait_and_get_for_node_address
raise ValueError(
INFO 04-30 18:19:21 [ray_distributed_executor.py:127] Shutting down Ray distributed executor. If you see error log from logging.cc regarding SIGTERM received, please ignore because this is the expected termination process in Ray.
ValueError: Can't find a `node_ip_address.json` file from /tmp/ray/session_2025-04-29_22-14-32_731655_1. for 60 seconds. A ray instance hasn't started. Did you do `ray start` or `ray.init` on this host?
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 1130, in <module>
uvloop.run(run_server(args))
File "/usr/local/lib/python3.12/dist-packages/uvloop/__init__.py", line 109, in run
return __asyncio.run(
^^^^^^^^^^^^^^
File "/usr/lib/python3.12/asyncio/runners.py", line 195, in run
return runner.run(main)
^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run
return self._loop.run_until_complete(task)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
File "/usr/local/lib/python3.12/dist-packages/uvloop/__init__.py", line 61, in wrapper
return await main
^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 1078, in run_server
async with build_async_engine_client(args) as engine_client:
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/contextlib.py", line 210, in __aenter__
return await anext(self.gen)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 146, in build_async_engine_client
async with build_async_engine_client_from_engine_args(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/contextlib.py", line 210, in __aenter__
return await anext(self.gen)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 178, in build_async_engine_client_from_engine_args
async_llm = AsyncLLM.from_vllm_config(
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 150, in from_vllm_config
return cls(
^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 118, in __init__
self.engine_core = core_client_class(
^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 642, in __init__
super().__init__(
File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 398, in __init__
self._wait_for_engine_startup()
File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 430, in _wait_for_engine_startup
raise RuntimeError("Engine core initialization failed. "
RuntimeError: Engine core initialization failed. See root cause above.