I have two H100 and four A100, i am trying to deploy qwen 2.5 - 7B, but i got this non sense error
(RayWorkerWrapper pid=950) ERROR 08-17 02:52:40 [worker_base.py:619] ValueError: Free memory on device (73.89/93.1 GiB) on startup is less than desired GPU memory utilization (0.9, 83.79 GiB). Decrease GPU memory utilization or reduce GPU memory used by other processes.
(pid=1282, ip=HEAD_NODE_IP) INFO 08-17 02:58:51 [init.py:235] Automatically detected platform cuda.
Traceback (most recent call last):
File “/usr/local/bin/vllm”, line 10, in
sys.exit(main())
^^^^^^
File “/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/main.py”, line 54, in main
args.dispatch_function(args)
File “/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/serve.py”, line 52, in cmd
uvloop.run(run_server(args))
File “/usr/local/lib/python3.12/dist-packages/uvloop/init.py”, line 109, in run
return __asyncio.run(
^^^^^^^^^^^^^^
File “/usr/lib/python3.12/asyncio/runners.py”, line 195, in run
return runner.run(main)
^^^^^^^^^^^^^^^^
File “/usr/lib/python3.12/asyncio/runners.py”, line 118, in run
return self._loop.run_until_complete(task)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “uvloop/loop.pyx”, line 1518, in uvloop.loop.Loop.run_until_complete
File “/usr/local/lib/python3.12/dist-packages/uvloop/init.py”, line 61, in wrapper
return await main
^^^^^^^^^^
File “/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py”, line 1791, in run_server
await run_server_worker(listen_address, sock, args, **uvicorn_kwargs)
File “/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py”, line 1811, in run_server_worker
async with build_async_engine_client(args, client_config) as engine_client:
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/usr/lib/python3.12/contextlib.py”, line 210, in aenter
return await anext(self.gen)
^^^^^^^^^^^^^^^^^^^^^
File “/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py”, line 158, in build_async_engine_client
async with build_async_engine_client_from_engine_args(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/usr/lib/python3.12/contextlib.py”, line 210, in aenter
return await anext(self.gen)
^^^^^^^^^^^^^^^^^^^^^
File “/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py”, line 194, in build_async_engine_client_from_engine_args
async_llm = AsyncLLM.from_vllm_config(
^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py”, line 163, in from_vllm_config
return cls(
^^^^
File “/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py”, line 117, in init
self.engine_core = EngineCoreClient.make_async_mp_client(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py”, line 98, in make_async_mp_client
return AsyncMPClient(*client_args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py”, line 677, in init
super().init(
File “/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py”, line 408, in init
with launch_core_engines(vllm_config, executor_class,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/usr/lib/python3.12/contextlib.py”, line 144, in exit
next(self.gen)
File “/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/utils.py”, line 697, in launch_core_engines
wait_for_engine_startup(
File “/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/utils.py”, line 750, in wait_for_engine_startup
raise RuntimeError("Engine core initialization failed. "
RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}
My Ray status:
Active:
1 node_942272b0ead70054e8f1a2489cc736b6202c050f00ce4a6094da5b11
1 node_4e310ef633848ce78fa4008a0b8b4100d93d60355f6b1b099498018f
Pending:
(no pending nodes)
Recent failures:
(no failures)
Resources
Total Usage:
0.0/256.0 CPU
0.0/5.0 GPU
0B/1.58TiB memory
0B/19.46GiB object_store_memory
Total Constraints:
(no request_resources() constraints)
Total Demands:
(no resource demands)