Hi,
I’m building VLLM on a redhat ubi8-minimal container and one of the things that I’ve had to do was to microdnf install gcc-c++ in the Dockerfile. Unfortunately gcc seems to rely on kernel-headers to compile its things, and VLLM does not seem to start up correctly if there is no gcc… The problem with kernel-headers is that it introduces a lot of vulnerabilities, and I’m wondering if I can get away with not installing gcc at all?
Please see below for the runtime error I face when I start up vllm, is there a way I can configure vllm to prevent it from using the C compiler?
INFO 06-20 01:58:58 [init.py:244] Automatically detected platform cuda.
vllm | INFO 06-20 01:59:01 [api_server.py:1287] vLLM API server version 0.9.2.dev169+gea10dd9d9
vllm | INFO 06-20 01:59:01 [cli_args.py:309] non-default args: {‘host’: ‘0.0.0.0’, ‘enable_auto_tool_choice’: True, ‘tool_call_parser’: ‘llama3_json’, ‘model’: ‘/meta/llama3.1-8b’, ‘max_model_len’: 12000, ‘gpu_memory_utilization’: 0.95}
vllm | INFO 06-20 01:59:09 [config.py:831] This model supports multiple tasks: {‘reward’, ‘classify’, ‘score’, ‘generate’, ‘embed’}. Defaulting to ‘generate’.
vllm | INFO 06-20 01:59:09 [config.py:1444] Using max model len 12000
vllm | INFO 06-20 01:59:10 [awq_marlin.py:116] The model is convertible to awq_marlin during runtime. Using awq_marlin kernel.
vllm | INFO 06-20 01:59:10 [config.py:2197] Chunked prefill is enabled with max_num_batched_tokens=2048.
vllm | INFO 06-20 01:59:14 [init.py:244] Automatically detected platform cuda.
vllm | INFO 06-20 01:59:17 [core.py:459] Waiting for init message from front-end.
vllm | INFO 06-20 01:59:17 [core.py:69] Initializing a V1 LLM engine (v0.9.2.dev169+gea10dd9d9) with config: model=‘/meta/llama3.1-8b’, speculative_config=None, tokenizer=‘/meta/llama3.1-8b’, skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config={}, tokenizer_revision=None, trust_remote_code=False, dtype=torch.float16, max_seq_len=12000, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=awq_marlin, enforce_eager=False, kv_cache_dtype=auto, device_config=cuda, decoding_config=DecodingConfig(backend=‘auto’, disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_backend=‘’), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None), seed=0, served_model_name=/meta/llama3.1-8b, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=True, chunked_prefill_enabled=True, use_async_output_proc=True, pooler_config=None, compilation_config={“level”:3,“debug_dump_path”:“”,“cache_dir”:“”,“backend”:“”,“custom_ops”:[“none”],“splitting_ops”:[“vllm.unified_attention”,“vllm.unified_attention_with_output”],“use_inductor”:true,“compile_sizes”:,“inductor_compile_config”:{“enable_auto_functionalized_v2”:false},“inductor_passes”:{},“use_cudagraph”:true,“cudagraph_num_of_warmups”:1,“cudagraph_capture_sizes”:[512,504,496,488,480,472,464,456,448,440,432,424,416,408,400,392,384,376,368,360,352,344,336,328,320,312,304,296,288,280,272,264,256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],“cudagraph_copy_inputs”:false,“full_cuda_graph”:false,“max_capture_size”:512,“local_cache_dir”:null}
vllm | Process EngineCore_0:
vllm | ERROR 06-20 01:59:17 [core.py:519] EngineCore failed to start.
vllm | ERROR 06-20 01:59:17 [core.py:519] Traceback (most recent call last):
vllm | ERROR 06-20 01:59:17 [core.py:519] File “/usr/local/lib64/python3.12/site-packages/vllm/v1/engine/core.py”, line 510, in run_engine_core
vllm | ERROR 06-20 01:59:17 [core.py:519] engine_core = EngineCoreProc(*args, **kwargs)
vllm | ERROR 06-20 01:59:17 [core.py:519] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm | ERROR 06-20 01:59:17 [core.py:519] File “/usr/local/lib64/python3.12/site-packages/vllm/v1/engine/core.py”, line 394, in init
vllm | ERROR 06-20 01:59:17 [core.py:519] super().init(vllm_config, executor_class, log_stats,
vllm | ERROR 06-20 01:59:17 [core.py:519] File “/usr/local/lib64/python3.12/site-packages/vllm/v1/engine/core.py”, line 75, in init
vllm | ERROR 06-20 01:59:17 [core.py:519] self.model_executor = executor_class(vllm_config)
vllm | ERROR 06-20 01:59:17 [core.py:519] ^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm | ERROR 06-20 01:59:17 [core.py:519] File “/usr/local/lib64/python3.12/site-packages/vllm/executor/executor_base.py”, line 53, in init
vllm | ERROR 06-20 01:59:17 [core.py:519] self._init_executor()
vllm | ERROR 06-20 01:59:17 [core.py:519] File “/usr/local/lib64/python3.12/site-packages/vllm/executor/uniproc_executor.py”, line 46, in _init_executor
vllm | ERROR 06-20 01:59:17 [core.py:519] self.collective_rpc(“init_worker”, args=([kwargs], ))
vllm | ERROR 06-20 01:59:17 [core.py:519] File “/usr/local/lib64/python3.12/site-packages/vllm/executor/uniproc_executor.py”, line 57, in collective_rpc
vllm | ERROR 06-20 01:59:17 [core.py:519] answer = run_method(self.driver_worker, method, args, kwargs)
vllm | ERROR 06-20 01:59:17 [core.py:519] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm | ERROR 06-20 01:59:17 [core.py:519] File “/usr/local/lib64/python3.12/site-packages/vllm/utils.py”, line 2690, in run_method
vllm | ERROR 06-20 01:59:17 [core.py:519] return func(*args, **kwargs)
vllm | ERROR 06-20 01:59:17 [core.py:519] ^^^^^^^^^^^^^^^^^^^^^
vllm | ERROR 06-20 01:59:17 [core.py:519] File “/usr/local/lib64/python3.12/site-packages/vllm/worker/worker_base.py”, line 558, in init_worker
vllm | ERROR 06-20 01:59:17 [core.py:519] worker_class = resolve_obj_by_qualname(
vllm | ERROR 06-20 01:59:17 [core.py:519] ^^^^^^^^^^^^^^^^^^^^^^^^
vllm | ERROR 06-20 01:59:17 [core.py:519] File “/usr/local/lib64/python3.12/site-packages/vllm/utils.py”, line 2258, in resolve_obj_by_qualname
vllm | ERROR 06-20 01:59:17 [core.py:519] module = importlib.import_module(module_name)
vllm | ERROR 06-20 01:59:17 [core.py:519] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm | ERROR 06-20 01:59:17 [core.py:519] File “/usr/lib64/python3.12/importlib/init.py”, line 90, in import_module
vllm | ERROR 06-20 01:59:17 [core.py:519] return _bootstrap._gcd_import(name[level:], package, level)
vllm | ERROR 06-20 01:59:17 [core.py:519] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm | ERROR 06-20 01:59:17 [core.py:519] File “”, line 1387, in _gcd_import
vllm | ERROR 06-20 01:59:17 [core.py:519] File “”, line 1360, in _find_and_load
vllm | ERROR 06-20 01:59:17 [core.py:519] File “”, line 1331, in _find_and_load_unlocked
vllm | ERROR 06-20 01:59:17 [core.py:519] File “”, line 935, in _load_unlocked
vllm | ERROR 06-20 01:59:17 [core.py:519] File “”, line 999, in exec_module
vllm | ERROR 06-20 01:59:17 [core.py:519] File “”, line 488, in _call_with_frames_removed
vllm | ERROR 06-20 01:59:17 [core.py:519] File “/usr/local/lib64/python3.12/site-packages/vllm/v1/worker/gpu_worker.py”, line 29, in
vllm | ERROR 06-20 01:59:17 [core.py:519] from vllm.v1.worker.gpu_model_runner import GPUModelRunner
vllm | ERROR 06-20 01:59:17 [core.py:519] File “/usr/local/lib64/python3.12/site-packages/vllm/v1/worker/gpu_model_runner.py”, line 33, in
vllm | ERROR 06-20 01:59:17 [core.py:519] from vllm.model_executor.layers.mamba.mamba_mixer2 import MambaMixer2
vllm | ERROR 06-20 01:59:17 [core.py:519] File “/usr/local/lib64/python3.12/site-packages/vllm/model_executor/layers/mamba/mamba_mixer2.py”, line 25, in
vllm | ERROR 06-20 01:59:17 [core.py:519] from vllm.model_executor.layers.mamba.ops.ssd_combined import (
vllm | ERROR 06-20 01:59:17 [core.py:519] File “/usr/local/lib64/python3.12/site-packages/vllm/model_executor/layers/mamba/ops/ssd_combined.py”, line 15, in
vllm | ERROR 06-20 01:59:17 [core.py:519] from .ssd_bmm import _bmm_chunk_fwd
vllm | ERROR 06-20 01:59:17 [core.py:519] File “/usr/local/lib64/python3.12/site-packages/vllm/model_executor/layers/mamba/ops/ssd_bmm.py”, line 16, in
vllm | ERROR 06-20 01:59:17 [core.py:519] @triton.autotune(
vllm | ERROR 06-20 01:59:17 [core.py:519] ^^^^^^^^^^^^^^^^
vllm | ERROR 06-20 01:59:17 [core.py:519] File “/usr/local/lib64/python3.12/site-packages/triton/runtime/autotuner.py”, line 378, in decorator
vllm | ERROR 06-20 01:59:17 [core.py:519] return Autotuner(fn, fn.arg_names, configs, key, reset_to_zero, restore_value, pre_hook=pre_hook,
vllm | ERROR 06-20 01:59:17 [core.py:519] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm | ERROR 06-20 01:59:17 [core.py:519] File “/usr/local/lib64/python3.12/site-packages/triton/runtime/autotuner.py”, line 130, in init
vllm | ERROR 06-20 01:59:17 [core.py:519] self.do_bench = driver.active.get_benchmarker()
vllm | ERROR 06-20 01:59:17 [core.py:519] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm | ERROR 06-20 01:59:17 [core.py:519] File “/usr/local/lib64/python3.12/site-packages/triton/runtime/driver.py”, line 23, in getattr
vllm | ERROR 06-20 01:59:17 [core.py:519] self._initialize_obj()
vllm | ERROR 06-20 01:59:17 [core.py:519] File “/usr/local/lib64/python3.12/site-packages/triton/runtime/driver.py”, line 20, in _initialize_obj
vllm | ERROR 06-20 01:59:17 [core.py:519] self._obj = self._init_fn()
vllm | ERROR 06-20 01:59:17 [core.py:519] ^^^^^^^^^^^^^^^
vllm | ERROR 06-20 01:59:17 [core.py:519] File “/usr/local/lib64/python3.12/site-packages/triton/runtime/driver.py”, line 9, in _create_driver
vllm | ERROR 06-20 01:59:17 [core.py:519] return actives0
vllm | ERROR 06-20 01:59:17 [core.py:519] ^^^^^^^^^^^^
vllm | ERROR 06-20 01:59:17 [core.py:519] File “/usr/local/lib64/python3.12/site-packages/triton/backends/nvidia/driver.py”, line 535, in init
vllm | ERROR 06-20 01:59:17 [core.py:519] self.utils = CudaUtils() # TODO: make static
vllm | ERROR 06-20 01:59:17 [core.py:519] ^^^^^^^^^^^
vllm | ERROR 06-20 01:59:17 [core.py:519] File “/usr/local/lib64/python3.12/site-packages/triton/backends/nvidia/driver.py”, line 89, in init
vllm | ERROR 06-20 01:59:17 [core.py:519] mod = compile_module_from_src(Path(os.path.join(dirname, “driver.c”)).read_text(), “cuda_utils”)
vllm | ERROR 06-20 01:59:17 [core.py:519] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm | ERROR 06-20 01:59:17 [core.py:519] File “/usr/local/lib64/python3.12/site-packages/triton/backends/nvidia/driver.py”, line 66, in compile_module_from_src
vllm | ERROR 06-20 01:59:17 [core.py:519] so = _build(name, src_path, tmpdir, library_dirs(), include_dir, libraries)
vllm | ERROR 06-20 01:59:17 [core.py:519] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm | ERROR 06-20 01:59:17 [core.py:519] File “/usr/local/lib64/python3.12/site-packages/triton/runtime/build.py”, line 18, in _build
vllm | ERROR 06-20 01:59:17 [core.py:519] raise RuntimeError(“Failed to find C compiler. Please specify via CC environment variable.”)
vllm | ERROR 06-20 01:59:17 [core.py:519] RuntimeError: Failed to find C compiler. Please specify via CC environment variable.
vllm | Traceback (most recent call last):
vllm | File “/usr/lib64/python3.12/multiprocessing/process.py”, line 314, in _bootstrap
vllm | self.run()
vllm | File “/usr/lib64/python3.12/multiprocessing/process.py”, line 108, in run
vllm | self._target(*self._args, **self._kwargs)
vllm | File “/usr/local/lib64/python3.12/site-packages/vllm/v1/engine/core.py”, line 523, in run_engine_core
vllm | raise e
vllm | File “/usr/local/lib64/python3.12/site-packages/vllm/v1/engine/core.py”, line 510, in run_engine_core
vllm | engine_core = EngineCoreProc(*args, **kwargs)
vllm | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm | File “/usr/local/lib64/python3.12/site-packages/vllm/v1/engine/core.py”, line 394, in init
vllm | super().init(vllm_config, executor_class, log_stats,
vllm | File “/usr/local/lib64/python3.12/site-packages/vllm/v1/engine/core.py”, line 75, in init
vllm | self.model_executor = executor_class(vllm_config)
vllm | ^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm | File “/usr/local/lib64/python3.12/site-packages/vllm/executor/executor_base.py”, line 53, in init
vllm | self._init_executor()
vllm | File “/usr/local/lib64/python3.12/site-packages/vllm/executor/uniproc_executor.py”, line 46, in _init_executor
vllm | self.collective_rpc(“init_worker”, args=([kwargs], ))
vllm | File “/usr/local/lib64/python3.12/site-packages/vllm/executor/uniproc_executor.py”, line 57, in collective_rpc
vllm | answer = run_method(self.driver_worker, method, args, kwargs)
vllm | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm | File “/usr/local/lib64/python3.12/site-packages/vllm/utils.py”, line 2690, in run_method
vllm | return func(*args, **kwargs)
vllm | ^^^^^^^^^^^^^^^^^^^^^
vllm | File “/usr/local/lib64/python3.12/site-packages/vllm/worker/worker_base.py”, line 558, in init_worker
vllm | worker_class = resolve_obj_by_qualname(
vllm | ^^^^^^^^^^^^^^^^^^^^^^^^
vllm | File “/usr/local/lib64/python3.12/site-packages/vllm/utils.py”, line 2258, in resolve_obj_by_qualname
vllm | module = importlib.import_module(module_name)
vllm | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm | File “/usr/lib64/python3.12/importlib/init.py”, line 90, in import_module
vllm | return _bootstrap._gcd_import(name[level:], package, level)
vllm | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm | File “”, line 1387, in _gcd_import
vllm | File “”, line 1360, in _find_and_load
vllm | File “”, line 1331, in _find_and_load_unlocked
vllm | File “”, line 935, in _load_unlocked
vllm | File “”, line 999, in exec_module
vllm | File “”, line 488, in _call_with_frames_removed
vllm | File “/usr/local/lib64/python3.12/site-packages/vllm/v1/worker/gpu_worker.py”, line 29, in
vllm | from vllm.v1.worker.gpu_model_runner import GPUModelRunner
vllm | File “/usr/local/lib64/python3.12/site-packages/vllm/v1/worker/gpu_model_runner.py”, line 33, in
vllm | from vllm.model_executor.layers.mamba.mamba_mixer2 import MambaMixer2
vllm | File “/usr/local/lib64/python3.12/site-packages/vllm/model_executor/layers/mamba/mamba_mixer2.py”, line 25, in
vllm | from vllm.model_executor.layers.mamba.ops.ssd_combined import (
vllm | File “/usr/local/lib64/python3.12/site-packages/vllm/model_executor/layers/mamba/ops/ssd_combined.py”, line 15, in
vllm | from .ssd_bmm import _bmm_chunk_fwd
vllm | File “/usr/local/lib64/python3.12/site-packages/vllm/model_executor/layers/mamba/ops/ssd_bmm.py”, line 16, in
vllm | @triton.autotune(
vllm | ^^^^^^^^^^^^^^^^
vllm | File “/usr/local/lib64/python3.12/site-packages/triton/runtime/autotuner.py”, line 378, in decorator
vllm | return Autotuner(fn, fn.arg_names, configs, key, reset_to_zero, restore_value, pre_hook=pre_hook,
vllm | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm | File “/usr/local/lib64/python3.12/site-packages/triton/runtime/autotuner.py”, line 130, in init
vllm | self.do_bench = driver.active.get_benchmarker()
vllm | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm | File “/usr/local/lib64/python3.12/site-packages/triton/runtime/driver.py”, line 23, in getattr
vllm | self._initialize_obj()
vllm | File “/usr/local/lib64/python3.12/site-packages/triton/runtime/driver.py”, line 20, in _initialize_obj
vllm | self._obj = self._init_fn()
vllm | ^^^^^^^^^^^^^^^
vllm | File “/usr/local/lib64/python3.12/site-packages/triton/runtime/driver.py”, line 9, in _create_driver
vllm | return actives0
vllm | ^^^^^^^^^^^^
vllm | File “/usr/local/lib64/python3.12/site-packages/triton/backends/nvidia/driver.py”, line 535, in init
vllm | self.utils = CudaUtils() # TODO: make static
vllm | ^^^^^^^^^^^
vllm | File “/usr/local/lib64/python3.12/site-packages/triton/backends/nvidia/driver.py”, line 89, in init
vllm | mod = compile_module_from_src(Path(os.path.join(dirname, “driver.c”)).read_text(), “cuda_utils”)
vllm | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm | File “/usr/local/lib64/python3.12/site-packages/triton/backends/nvidia/driver.py”, line 66, in compile_module_from_src
vllm | so = _build(name, src_path, tmpdir, library_dirs(), include_dir, libraries)
vllm | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm | File “/usr/local/lib64/python3.12/site-packages/triton/runtime/build.py”, line 18, in _build
vllm | raise RuntimeError(“Failed to find C compiler. Please specify via CC environment variable.”)
vllm | RuntimeError: Failed to find C compiler. Please specify via CC environment variable.
vllm | Traceback (most recent call last):
vllm | File “”, line 198, in _run_module_as_main
vllm | File “”, line 88, in _run_code
vllm | File “/usr/local/lib64/python3.12/site-packages/vllm/entrypoints/openai/api_server.py”, line 1387, in
vllm | uvloop.run(run_server(args))
vllm | File “/usr/local/lib64/python3.12/site-packages/uvloop/init.py”, line 109, in run
vllm | return __asyncio.run(
vllm | ^^^^^^^^^^^^^^
vllm | File “/usr/lib64/python3.12/asyncio/runners.py”, line 195, in run
vllm | return runner.run(main)
vllm | ^^^^^^^^^^^^^^^^
vllm | File “/usr/lib64/python3.12/asyncio/runners.py”, line 118, in run
vllm | return self._loop.run_until_complete(task)
vllm | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm | File “uvloop/loop.pyx”, line 1518, in uvloop.loop.Loop.run_until_complete
vllm | File “/usr/local/lib64/python3.12/site-packages/uvloop/init.py”, line 61, in wrapper
vllm | return await main
vllm | ^^^^^^^^^^
vllm | File “/usr/local/lib64/python3.12/site-packages/vllm/entrypoints/openai/api_server.py”, line 1323, in run_server
vllm | await run_server_worker(listen_address, sock, args, **uvicorn_kwargs)
vllm | File “/usr/local/lib64/python3.12/site-packages/vllm/entrypoints/openai/api_server.py”, line 1343, in run_server_worker
vllm | async with build_async_engine_client(args, client_config) as engine_client:
vllm | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm | File “/usr/lib64/python3.12/contextlib.py”, line 210, in aenter
vllm | return await anext(self.gen)
vllm | ^^^^^^^^^^^^^^^^^^^^^
vllm | File “/usr/local/lib64/python3.12/site-packages/vllm/entrypoints/openai/api_server.py”, line 155, in build_async_engine_client
vllm | async with build_async_engine_client_from_engine_args(
vllm | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm | File “/usr/lib64/python3.12/contextlib.py”, line 210, in aenter
vllm | return await anext(self.gen)
vllm | ^^^^^^^^^^^^^^^^^^^^^
vllm | File “/usr/local/lib64/python3.12/site-packages/vllm/entrypoints/openai/api_server.py”, line 191, in build_async_engine_client_from_engine_args
vllm | async_llm = AsyncLLM.from_vllm_config(
vllm | ^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm | File “/usr/local/lib64/python3.12/site-packages/vllm/v1/engine/async_llm.py”, line 162, in from_vllm_config
vllm | return cls(
vllm | ^^^^
vllm | File “/usr/local/lib64/python3.12/site-packages/vllm/v1/engine/async_llm.py”, line 124, in init
vllm | self.engine_core = EngineCoreClient.make_async_mp_client(
vllm | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm | File “/usr/local/lib64/python3.12/site-packages/vllm/v1/engine/core_client.py”, line 93, in make_async_mp_client
vllm | return AsyncMPClient(vllm_config, executor_class, log_stats,
vllm | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm | File “/usr/local/lib64/python3.12/site-packages/vllm/v1/engine/core_client.py”, line 716, in init
vllm | super().init(
vllm | File “/usr/local/lib64/python3.12/site-packages/vllm/v1/engine/core_client.py”, line 422, in init
vllm | self._init_engines_direct(vllm_config, local_only,
vllm | File “/usr/local/lib64/python3.12/site-packages/vllm/v1/engine/core_client.py”, line 491, in _init_engines_direct
vllm | self._wait_for_engine_startup(handshake_socket, input_address,
vllm | File “/usr/local/lib64/python3.12/site-packages/vllm/v1/engine/core_client.py”, line 511, in _wait_for_engine_startup
vllm | wait_for_engine_startup(
vllm | File “/usr/local/lib64/python3.12/site-packages/vllm/v1/utils.py”, line 494, in wait_for_engine_startup
vllm | raise RuntimeError("Engine core initialization failed. "
vllm | RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}
vllm exited with code 0