Not able to run GLM-4.5-Air on rocm 7.0 with 2x 7900 xtx

Getting these errors when trying to server GLM-4.5 with 2x 7900 XTX
Tried to run it with this command using latest vllm rocm docker:

docker run -it --dns=192.168.49.1 --network=host --group-add=video --ipc=host --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --device /dev/kfd --device /dev/dri --shm-size=16g -e HUGGING_FACE_HUB_TOKEN=“” -e VLLM_SLEEP_WHEN_IDLE=1 -e ROCM_VISIBLE_DEVICES=0,1 -e HIP_VISIBLE_DEVICES=0,1 -e HSA_OVERRIDE_GFX_VERSION=11.0.0 -e PYTORCH_ROCM_ARCH=“gfx1100” -e VLLM_USE_TRITON_FLASH_ATTN=0 -e GPU_MAX_HW_QUEUES=1 -e NCCL_DEBUG=WARN -e NCCL_IB_DISABLE=1 --restart unless-stopped --name vllm_rocm_GLM-4.5-Air -v /home/ubuntu/vllm_models:/root/.cache/huggingface rocm/vllm:latest vllm serve zai-org/GLM-4.5-Air --tool-call-parser glm45 --reasoning-parser glm45 --enable-auto-tool-choice --host 0.0.0.0 --port 8000 --enforce-eager --served-model-name vllm/GLM --tensor-parallel-size 2 --trust-remote-code --dtype bfloat16 --kv-cache-dtype auto --max-model-len 4096 --max-num-seqs 2 --max-num-batched-tokens 16384 --gpu-memory-utilization 0.92 --swap-space 22 --disable-log-requests --disable-log-stats --max-log-len 100

(Worker_TP1 pid=199) INFO 10-17 22:44:10 [gpu_model_runner.py:2596] Starting to load model zai-org/GLM-4.5-Air…
(Worker_TP0 pid=198) INFO 10-17 22:44:10 [gpu_model_runner.py:2596] Starting to load model zai-org/GLM-4.5-Air…
(Worker_TP1 pid=199) INFO 10-17 22:44:10 [gpu_model_runner.py:2628] Loading model from scratch…
(Worker_TP0 pid=198) INFO 10-17 22:44:10 [gpu_model_runner.py:2628] Loading model from scratch…
(Worker_TP1 pid=199) INFO 10-17 22:44:10 [rocm.py:245] Using Rocm/Aiter Attention backend on V1 engine.
(Worker_TP0 pid=198) INFO 10-17 22:44:10 [rocm.py:245] Using Rocm/Aiter Attention backend on V1 engine.
(Worker_TP1 pid=199) ERROR 10-17 22:44:11 [multiproc_executor.py:597] WorkerProc failed to start.
(Worker_TP1 pid=199) ERROR 10-17 22:44:11 [multiproc_executor.py:597] Traceback (most recent call last):
(Worker_TP1 pid=199) ERROR 10-17 22:44:11 [multiproc_executor.py:597] File “/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py”, line 571, in worker_main
(Worker_TP1 pid=199) ERROR 10-17 22:44:11 [multiproc_executor.py:597] worker = WorkerProc(*args, **kwargs)
(Worker_TP1 pid=199) ERROR 10-17 22:44:11 [multiproc_executor.py:597] ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=199) ERROR 10-17 22:44:11 [multiproc_executor.py:597] File “/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py”, line 437, in init
(Worker_TP1 pid=199) ERROR 10-17 22:44:11 [multiproc_executor.py:597] self.worker.load_model()
(Worker_TP1 pid=199) ERROR 10-17 22:44:11 [multiproc_executor.py:597] File “/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py”, line 213, in load_model
(Worker_TP1 pid=199) ERROR 10-17 22:44:11 [multiproc_executor.py:597] self.model_runner.load_model(eep_scale_up=eep_scale_up)
(Worker_TP1 pid=199) ERROR 10-17 22:44:11 [multiproc_executor.py:597] File “/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py”, line 2629, in load_model
(Worker_TP1 pid=199) ERROR 10-17 22:44:11 [multiproc_executor.py:597] self.model = model_loader.load_model(
(Worker_TP1 pid=199) ERROR 10-17 22:44:11 [multiproc_executor.py:597] ^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=199) ERROR 10-17 22:44:11 [multiproc_executor.py:597] File “/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/base_loader.py”, line 45, in load_model
(Worker_TP1 pid=199) ERROR 10-17 22:44:11 [multiproc_executor.py:597] model = initialize_model(vllm_config=vllm_config,
(Worker_TP1 pid=199) ERROR 10-17 22:44:11 [multiproc_executor.py:597] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=198) ERROR 10-17 22:44:11 [multiproc_executor.py:597] WorkerProc failed to start.
(Worker_TP1 pid=199) ERROR 10-17 22:44:11 [multiproc_executor.py:597] File “/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/utils.py”, line 63, in initialize_model
(Worker_TP0 pid=198) ERROR 10-17 22:44:11 [multiproc_executor.py:597] Traceback (most recent call last):
(Worker_TP1 pid=199) ERROR 10-17 22:44:11 [multiproc_executor.py:597] return model_class(vllm_config=vllm_config, prefix=prefix)
(Worker_TP0 pid=198) ERROR 10-17 22:44:11 [multiproc_executor.py:597] File “/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py”, line 571, in worker_main
(Worker_TP1 pid=199) ERROR 10-17 22:44:11 [multiproc_executor.py:597] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=198) ERROR 10-17 22:44:11 [multiproc_executor.py:597] worker = WorkerProc(*args, **kwargs)
(Worker_TP1 pid=199) ERROR 10-17 22:44:11 [multiproc_executor.py:597] File “/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/glm4_moe.py”, line 630, in init
(Worker_TP0 pid=198) ERROR 10-17 22:44:11 [multiproc_executor.py:597] ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=199) ERROR 10-17 22:44:11 [multiproc_executor.py:597] self.model = Glm4MoeModel(vllm_config=vllm_config,
(Worker_TP0 pid=198) ERROR 10-17 22:44:11 [multiproc_executor.py:597] File “/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py”, line 437, in init
(Worker_TP1 pid=199) ERROR 10-17 22:44:11 [multiproc_executor.py:597] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=198) ERROR 10-17 22:44:11 [multiproc_executor.py:597] self.worker.load_model()
(Worker_TP1 pid=199) ERROR 10-17 22:44:11 [multiproc_executor.py:597] File “/usr/local/lib/python3.12/dist-packages/vllm/compilation/decorators.py”, line 201, in init
(Worker_TP0 pid=198) ERROR 10-17 22:44:11 [multiproc_executor.py:597] File “/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py”, line 213, in load_model
(Worker_TP1 pid=199) ERROR 10-17 22:44:11 [multiproc_executor.py:597] old_init(self, vllm_config=vllm_config, prefix=prefix, **kwargs)
(Worker_TP0 pid=198) ERROR 10-17 22:44:11 [multiproc_executor.py:597] self.model_runner.load_model(eep_scale_up=eep_scale_up)
(Worker_TP1 pid=199) ERROR 10-17 22:44:11 [multiproc_executor.py:597] File “/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/glm4_moe.py”, line 432, in init
(Worker_TP0 pid=198) ERROR 10-17 22:44:11 [multiproc_executor.py:597] File “/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py”, line 2629, in load_model
(Worker_TP1 pid=199) ERROR 10-17 22:44:11 [multiproc_executor.py:597] self.start_layer, self.end_layer, self.layers = make_layers(
(Worker_TP0 pid=198) ERROR 10-17 22:44:11 [multiproc_executor.py:597] self.model = model_loader.load_model(
(Worker_TP1 pid=199) ERROR 10-17 22:44:11 [multiproc_executor.py:597] ^^^^^^^^^^^^
(Worker_TP0 pid=198) ERROR 10-17 22:44:11 [multiproc_executor.py:597] ^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=199) ERROR 10-17 22:44:11 [multiproc_executor.py:597] File “/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py”, line 627, in make_layers
(Worker_TP0 pid=198) ERROR 10-17 22:44:11 [multiproc_executor.py:597] File “/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/base_loader.py”, line 45, in load_model
(Worker_TP1 pid=199) ERROR 10-17 22:44:11 [multiproc_executor.py:597] maybe_offload_to_cpu(layer_fn(prefix=f"{prefix}.{idx}“))
(Worker_TP0 pid=198) ERROR 10-17 22:44:11 [multiproc_executor.py:597] model = initialize_model(vllm_config=vllm_config,
(Worker_TP1 pid=199) ERROR 10-17 22:44:11 [multiproc_executor.py:597] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=198) ERROR 10-17 22:44:11 [multiproc_executor.py:597] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=199) ERROR 10-17 22:44:11 [multiproc_executor.py:597] File “/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/glm4_moe.py”, line 434, in
(Worker_TP0 pid=198) ERROR 10-17 22:44:11 [multiproc_executor.py:597] File “/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/utils.py”, line 63, in initialize_model
(Worker_TP1 pid=199) ERROR 10-17 22:44:11 [multiproc_executor.py:597] lambda prefix: Glm4MoeDecoderLayer(
(Worker_TP0 pid=198) ERROR 10-17 22:44:11 [multiproc_executor.py:597] return model_class(vllm_config=vllm_config, prefix=prefix)
(Worker_TP1 pid=199) ERROR 10-17 22:44:11 [multiproc_executor.py:597] ^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=198) ERROR 10-17 22:44:11 [multiproc_executor.py:597] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=199) ERROR 10-17 22:44:11 [multiproc_executor.py:597] File “/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/glm4_moe.py”, line 365, in init
(Worker_TP0 pid=198) ERROR 10-17 22:44:11 [multiproc_executor.py:597] File “/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/glm4_moe.py”, line 630, in init
(Worker_TP1 pid=199) ERROR 10-17 22:44:11 [multiproc_executor.py:597] self.mlp = Glm4MoE(
(Worker_TP0 pid=198) ERROR 10-17 22:44:11 [multiproc_executor.py:597] self.model = Glm4MoeModel(vllm_config=vllm_config,
(Worker_TP1 pid=199) ERROR 10-17 22:44:11 [multiproc_executor.py:597] ^^^^^^^^
(Worker_TP0 pid=198) ERROR 10-17 22:44:11 [multiproc_executor.py:597] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=199) ERROR 10-17 22:44:11 [multiproc_executor.py:597] File “/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/glm4_moe.py”, line 160, in init
(Worker_TP0 pid=198) ERROR 10-17 22:44:11 [multiproc_executor.py:597] File “/usr/local/lib/python3.12/dist-packages/vllm/compilation/decorators.py”, line 201, in init
(Worker_TP1 pid=199) ERROR 10-17 22:44:11 [multiproc_executor.py:597] self.experts = SharedFusedMoE(
(Worker_TP0 pid=198) ERROR 10-17 22:44:11 [multiproc_executor.py:597] old_init(self, vllm_config=vllm_config, prefix=prefix, **kwargs)
(Worker_TP1 pid=199) ERROR 10-17 22:44:11 [multiproc_executor.py:597] ^^^^^^^^^^^^^^^
(Worker_TP0 pid=198) ERROR 10-17 22:44:11 [multiproc_executor.py:597] File “/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/glm4_moe.py”, line 432, in init
(Worker_TP1 pid=199) ERROR 10-17 22:44:11 [multiproc_executor.py:597] File “/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/shared_fused_moe/shared_fused_moe.py”, line 25, in init
(Worker_TP0 pid=198) ERROR 10-17 22:44:11 [multiproc_executor.py:597] self.start_layer, self.end_layer, self.layers = make_layers(
(Worker_TP1 pid=199) ERROR 10-17 22:44:11 [multiproc_executor.py:597] super().init(**kwargs)
(Worker_TP0 pid=198) ERROR 10-17 22:44:11 [multiproc_executor.py:597] ^^^^^^^^^^^^
(Worker_TP1 pid=199) ERROR 10-17 22:44:11 [multiproc_executor.py:597] File “/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py”, line 1140, in init
(Worker_TP0 pid=198) ERROR 10-17 22:44:11 [multiproc_executor.py:597] File “/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py”, line 627, in make_layers
(Worker_TP1 pid=199) ERROR 10-17 22:44:11 [multiproc_executor.py:597] self.quant_method.create_weights(layer=self, **moe_quant_params)
(Worker_TP0 pid=198) ERROR 10-17 22:44:11 [multiproc_executor.py:597] maybe_offload_to_cpu(layer_fn(prefix=f”{prefix}.{idx}"))
(Worker_TP1 pid=199) ERROR 10-17 22:44:11 [multiproc_executor.py:597] File “/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py”, line 365, in create_weights
(Worker_TP0 pid=198) ERROR 10-17 22:44:11 [multiproc_executor.py:597] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=199) ERROR 10-17 22:44:11 [multiproc_executor.py:597] w13_weight = torch.nn.Parameter(torch.empty(
(Worker_TP0 pid=198) ERROR 10-17 22:44:11 [multiproc_executor.py:597] File “/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/glm4_moe.py”, line 434, in
(Worker_TP1 pid=199) ERROR 10-17 22:44:11 [multiproc_executor.py:597] ^^^^^^^^^^^^
(Worker_TP0 pid=198) ERROR 10-17 22:44:11 [multiproc_executor.py:597] lambda prefix: Glm4MoeDecoderLayer(
(Worker_TP1 pid=199) ERROR 10-17 22:44:11 [multiproc_executor.py:597] File “/usr/local/lib/python3.12/dist-packages/torch/utils/_device.py”, line 103, in torch_function
(Worker_TP0 pid=198) ERROR 10-17 22:44:11 [multiproc_executor.py:597] ^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=199) ERROR 10-17 22:44:11 [multiproc_executor.py:597] return func(*args, **kwargs)
(Worker_TP0 pid=198) ERROR 10-17 22:44:11 [multiproc_executor.py:597] File “/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/glm4_moe.py”, line 365, in init
(Worker_TP1 pid=199) ERROR 10-17 22:44:11 [multiproc_executor.py:597] ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=198) ERROR 10-17 22:44:11 [multiproc_executor.py:597] self.mlp = Glm4MoE(
(Worker_TP1 pid=199) ERROR 10-17 22:44:11 [multiproc_executor.py:597] torch.OutOfMemoryError: HIP out of memory. Tried to allocate 1.38 GiB. GPU 1 has a total capacity of 23.98 GiB of which 712.00 MiB is free. Of the allocated memory 22.89 GiB is allocated by PyTorch, and 31.53 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management ( CUDA semantics — PyTorch 2.9 documentation )
(Worker_TP0 pid=198) ERROR 10-17 22:44:11 [multiproc_executor.py:597] ^^^^^^^^
(Worker_TP0 pid=198) ERROR 10-17 22:44:11 [multiproc_executor.py:597] File “/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/glm4_moe.py”, line 160, in init
(Worker_TP0 pid=198) ERROR 10-17 22:44:11 [multiproc_executor.py:597] self.experts = SharedFusedMoE(
(Worker_TP0 pid=198) ERROR 10-17 22:44:11 [multiproc_executor.py:597] ^^^^^^^^^^^^^^^
(Worker_TP0 pid=198) ERROR 10-17 22:44:11 [multiproc_executor.py:597] File “/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/shared_fused_moe/shared_fused_moe.py”, line 25, in init
(Worker_TP0 pid=198) ERROR 10-17 22:44:11 [multiproc_executor.py:597] super().init(**kwargs)
(Worker_TP0 pid=198) ERROR 10-17 22:44:11 [multiproc_executor.py:597] File “/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py”, line 1140, in init
(Worker_TP0 pid=198) ERROR 10-17 22:44:11 [multiproc_executor.py:597] self.quant_method.create_weights(layer=self, **moe_quant_params)
(Worker_TP0 pid=198) ERROR 10-17 22:44:11 [multiproc_executor.py:597] File “/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py”, line 365, in create_weights
(Worker_TP0 pid=198) ERROR 10-17 22:44:11 [multiproc_executor.py:597] w13_weight = torch.nn.Parameter(torch.empty(
(Worker_TP0 pid=198) ERROR 10-17 22:44:11 [multiproc_executor.py:597] ^^^^^^^^^^^^
(Worker_TP0 pid=198) ERROR 10-17 22:44:11 [multiproc_executor.py:597] File “/usr/local/lib/python3.12/dist-packages/torch/utils/_device.py”, line 103, in torch_function
(Worker_TP0 pid=198) ERROR 10-17 22:44:11 [multiproc_executor.py:597] return func(*args, **kwargs)
(Worker_TP0 pid=198) ERROR 10-17 22:44:11 [multiproc_executor.py:597] ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=198) ERROR 10-17 22:44:11 [multiproc_executor.py:597] torch.OutOfMemoryError: HIP out of memory. Tried to allocate 1.38 GiB. GPU 0 has a total capacity of 23.98 GiB of which 712.00 MiB is free. Of the allocated memory 22.89 GiB is allocated by PyTorch, and 31.53 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management ( CUDA semantics — PyTorch 2.9 documentation )
(Worker_TP1 pid=199) INFO 10-17 22:44:11 [multiproc_executor.py:558] Parent process exited, terminating worker
(Worker_TP0 pid=198) INFO 10-17 22:44:11 [multiproc_executor.py:558] Parent process exited, terminating worker
[rank0]:[W1017 22:44:11.428210311 ProcessGroupNCCL.cpp:1522] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see Distributed communication package - torch.distributed — PyTorch 2.9 documentation (function operator())
(EngineCore_DP0 pid=144) ERROR 10-17 22:44:12 [core.py:708] EngineCore failed to start.
(EngineCore_DP0 pid=144) ERROR 10-17 22:44:12 [core.py:708] Traceback (most recent call last):
(EngineCore_DP0 pid=144) ERROR 10-17 22:44:12 [core.py:708] File “/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py”, line 699, in run_engine_core
(EngineCore_DP0 pid=144) ERROR 10-17 22:44:12 [core.py:708] engine_core = EngineCoreProc(*args, **kwargs)
(EngineCore_DP0 pid=144) ERROR 10-17 22:44:12 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=144) ERROR 10-17 22:44:12 [core.py:708] File “/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py”, line 498, in init
(EngineCore_DP0 pid=144) ERROR 10-17 22:44:12 [core.py:708] super().init(vllm_config, executor_class, log_stats,
(EngineCore_DP0 pid=144) ERROR 10-17 22:44:12 [core.py:708] File “/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py”, line 83, in init
(EngineCore_DP0 pid=144) ERROR 10-17 22:44:12 [core.py:708] self.model_executor = executor_class(vllm_config)
(EngineCore_DP0 pid=144) ERROR 10-17 22:44:12 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=144) ERROR 10-17 22:44:12 [core.py:708] File “/usr/local/lib/python3.12/dist-packages/vllm/executor/executor_base.py”, line 54, in init
(EngineCore_DP0 pid=144) ERROR 10-17 22:44:12 [core.py:708] self._init_executor()
(EngineCore_DP0 pid=144) ERROR 10-17 22:44:12 [core.py:708] File “/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py”, line 106, in _init_executor
(EngineCore_DP0 pid=144) ERROR 10-17 22:44:12 [core.py:708] self.workers = WorkerProc.wait_for_ready(unready_workers)
(EngineCore_DP0 pid=144) ERROR 10-17 22:44:12 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=144) ERROR 10-17 22:44:12 [core.py:708] File “/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py”, line 509, in wait_for_ready
(EngineCore_DP0 pid=144) ERROR 10-17 22:44:12 [core.py:708] raise e from None
(EngineCore_DP0 pid=144) ERROR 10-17 22:44:12 [core.py:708] Exception: WorkerProc initialization failed due to an exception in a background process. See stack trace for root cause.
(EngineCore_DP0 pid=144) Process EngineCore_DP0:
(EngineCore_DP0 pid=144) Traceback (most recent call last):
(EngineCore_DP0 pid=144) File “/usr/lib/python3.12/multiprocessing/process.py”, line 314, in _bootstrap
(EngineCore_DP0 pid=144) self.run()
(EngineCore_DP0 pid=144) File “/usr/lib/python3.12/multiprocessing/process.py”, line 108, in run
(EngineCore_DP0 pid=144) self._target(*self._args, **self._kwargs)
(EngineCore_DP0 pid=144) File “/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py”, line 712, in run_engine_core
(EngineCore_DP0 pid=144) raise e
(EngineCore_DP0 pid=144) File “/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py”, line 699, in run_engine_core
(EngineCore_DP0 pid=144) engine_core = EngineCoreProc(*args, **kwargs)
(EngineCore_DP0 pid=144) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=144) File “/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py”, line 498, in init
(EngineCore_DP0 pid=144) super().init(vllm_config, executor_class, log_stats,
(EngineCore_DP0 pid=144) File “/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py”, line 83, in init
(EngineCore_DP0 pid=144) self.model_executor = executor_class(vllm_config)
(EngineCore_DP0 pid=144) ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=144) File “/usr/local/lib/python3.12/dist-packages/vllm/executor/executor_base.py”, line 54, in init
(EngineCore_DP0 pid=144) self._init_executor()
(EngineCore_DP0 pid=144) File “/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py”, line 106, in _init_executor
(EngineCore_DP0 pid=144) self.workers = WorkerProc.wait_for_ready(unready_workers)
(EngineCore_DP0 pid=144) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=144) File “/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py”, line 509, in wait_for_ready
(EngineCore_DP0 pid=144) raise e from None
(EngineCore_DP0 pid=144) Exception: WorkerProc initialization failed due to an exception in a background process. See stack trace for root cause.
(APIServer pid=1) Traceback (most recent call last):
(APIServer pid=1) File “/usr/local/bin/vllm”, line 7, in
(APIServer pid=1) sys.exit(main())
(APIServer pid=1) ^^^^^^
(APIServer pid=1) File “/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/main.py”, line 54, in main
(APIServer pid=1) args.dispatch_function(args)
(APIServer pid=1) File “/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/serve.py”, line 57, in cmd
(APIServer pid=1) uvloop.run(run_server(args))
(APIServer pid=1) File “/usr/local/lib/python3.12/dist-packages/uvloop/init.py”, line 109, in run
(APIServer pid=1) return __asyncio.run(
(APIServer pid=1) ^^^^^^^^^^^^^^
(APIServer pid=1) File “/usr/lib/python3.12/asyncio/runners.py”, line 195, in run
(APIServer pid=1) return runner.run(main)
(APIServer pid=1) ^^^^^^^^^^^^^^^^
(APIServer pid=1) File “/usr/lib/python3.12/asyncio/runners.py”, line 118, in run
(APIServer pid=1) return self._loop.run_until_complete(task)
(APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) File “uvloop/loop.pyx”, line 1518, in uvloop.loop.Loop.run_until_complete
(APIServer pid=1) File “/usr/local/lib/python3.12/dist-packages/uvloop/init.py”, line 61, in wrapper
(APIServer pid=1) return await main
(APIServer pid=1) ^^^^^^^^^^
(APIServer pid=1) File “/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py”, line 1863, in run_server
(APIServer pid=1) await run_server_worker(listen_address, sock, args, **uvicorn_kwargs)
(APIServer pid=1) File “/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py”, line 1881, in run_server_worker
(APIServer pid=1) async with build_async_engine_client(
(APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) File “/usr/lib/python3.12/contextlib.py”, line 210, in aenter
(APIServer pid=1) return await anext(self.gen)
(APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) File “/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py”, line 178, in build_async_engine_client
(APIServer pid=1) async with build_async_engine_client_from_engine_args(
(APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) File “/usr/lib/python3.12/contextlib.py”, line 210, in aenter
(APIServer pid=1) return await anext(self.gen)
(APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) File “/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py”, line 223, in build_async_engine_client_from_engine_args
(APIServer pid=1) async_llm = AsyncLLM.from_vllm_config(
(APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) File “/usr/local/lib/python3.12/dist-packages/vllm/utils/init.py”, line 1570, in inner
(APIServer pid=1) return fn(*args, **kwargs)
(APIServer pid=1) ^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) File “/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py”, line 207, in from_vllm_config
(APIServer pid=1) return cls(
(APIServer pid=1) ^^^^
(APIServer pid=1) File “/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py”, line 134, in init
(APIServer pid=1) self.engine_core = EngineCoreClient.make_async_mp_client(
(APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) File “/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py”, line 102, in make_async_mp_client
(APIServer pid=1) return AsyncMPClient(*client_args)
(APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) File “/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py”, line 769, in init
(APIServer pid=1) super().init(
(APIServer pid=1) File “/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py”, line 448, in init
(APIServer pid=1) with launch_core_engines(vllm_config, executor_class,
(APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) File “/usr/lib/python3.12/contextlib.py”, line 144, in exit
(APIServer pid=1) next(self.gen)
(APIServer pid=1) File “/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/utils.py”, line 732, in launch_core_engines
(APIServer pid=1) wait_for_engine_startup(
(APIServer pid=1) File “/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/utils.py”, line 785, in wait_for_engine_startup
(APIServer pid=1) raise RuntimeError("Engine core initialization failed. "
(APIServer pid=1) RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}
/usr/lib/python3.12/multiprocessing/resource_tracker.py:279: UserWarning: resource_tracker: There appear to be 1 leaked shared_memory objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d ’

Your error is a HIP out of memory (OOM) on both GPUs when loading GLM-4.5-Air with vLLM on 2x 7900 XTX. This means the model and its context length (4096) and batch size (max-num-seqs=2, max-num-batched-tokens=16384) exceed available VRAM, even with --gpu-memory-utilization 0.92 and swap-space 22. The error message suggests trying PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True, but this is unlikely to help if the model simply doesn’t fit in memory. There is no evidence in the provided data sources that GLM-4.5-Air will fit on 2x 24GB GPUs at BF16, and the official vLLM recipes for GLM-4.5 recommend using larger GPUs or more aggressive quantization for large models.

To reduce memory usage, try lowering --max-model-len, --max-num-seqs, and --max-num-batched-tokens, or use a quantized model if available. You may also try --gpu-memory-utilization 0.85 or lower. If the model still does not fit, you will need more GPU memory or a smaller model. For more details on memory conservation strategies, see Conserving Memory - vLLM and Troubleshooting - vLLM.

Would you like a step-by-step breakdown or example command for reducing memory usage?

Sources:

Hint: Mention @RunLLM in the post for followups.