Is the reason for my vllm 0.20.0 failing to start because of nixl?

diarrheavomit · May 3, 2026, 10:06am

Using the vllm/vllm-openai-rocm:latest docker image on debian 13 with rocm 7.2.2 (inside the container it’s 7.2.1.70201-81~22.04) and an amd v620.

This is what happens when I try to start vllm:

root@e9640e87ad42:/app# vllm serve \

--model /model \
–dtype float16 \
–max-model-len 32768 \
–max-num-seqs 1

WARNING 05-03 09:58:41 [argparse_utils.py:257] With vllm serve, you should provide the model as a positional argument or in a config file instead of via the --model option. The --model option will be removed in a future version.
(APIServer pid=8) INFO 05-03 09:58:41 [utils.py:299]
(APIServer pid=8) INFO 05-03 09:58:41 [utils.py:299] █ █ █▄ ▄█
(APIServer pid=8) INFO 05-03 09:58:41 [utils.py:299] ▄▄ ▄█ █ █ █ ▀▄▀ █ version 0.20.0
(APIServer pid=8) INFO 05-03 09:58:41 [utils.py:299] █▄█▀ █ █ █ █ model /model
(APIServer pid=8) INFO 05-03 09:58:41 [utils.py:299] ▀▀ ▀▀▀▀▀ ▀▀▀▀▀ ▀ ▀
(APIServer pid=8) INFO 05-03 09:58:41 [utils.py:299]
(APIServer pid=8) INFO 05-03 09:58:41 [utils.py:233] non-default args: {‘model_tag’: ‘/model’, ‘model’: ‘/model’, ‘dtype’: ‘float16’, ‘max_model_len’: 32768, ‘max_num_seqs’: 1}
(APIServer pid=8) INFO 05-03 09:58:54 [nixl_utils.py:20] Setting UCX_RCACHE_MAX_UNRELEASED to ‘1024’ to avoid a rare memory leak in UCX when using NIXL.
(APIServer pid=8) WARNING 05-03 09:58:54 [nixl_utils.py:34] NIXL is not available
(APIServer pid=8) WARNING 05-03 09:58:54 [nixl_utils.py:44] NIXL agent config is not available
(APIServer pid=8) INFO 05-03 09:58:54 [model.py:555] Resolved architecture: Qwen3ForCausalLM
(APIServer pid=8) WARNING 05-03 09:58:54 [model.py:2018] Casting torch.bfloat16 to torch.float16.
(APIServer pid=8) INFO 05-03 09:58:54 [model.py:1680] Using max model len 32768
(APIServer pid=8) INFO 05-03 09:58:54 [vllm.py:840] Asynchronous scheduling is enabled.
(APIServer pid=8) INFO 05-03 09:58:54 [kernel.py:205] Final IR op priority after setting platform defaults: IrOpPriorityConfig(rms_norm=[‘native’])
(EngineCore pid=407) INFO 05-03 09:59:02 [core.py:109] Initializing a V1 LLM engine (v0.20.0) with config: model=‘/model’, speculative_config=None, tokenizer=‘/model’, skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.float16, max_seq_len=32768, download_dir=None, load_format=auto, tensor_parallel_size=1, pipeline_parallel_size=1, data_parallel_size=1, decode_context_parallel_size=1, dcp_comm_backend=ag_rs, disable_custom_all_reduce=True, quantization=None, quantization_config=None, enforce_eager=False, enable_return_routed_experts=False, kv_cache_dtype=auto, device_config=cuda, structured_outputs_config=StructuredOutputsConfig(backend=‘auto’, disable_any_whitespace=False, disable_additional_properties=False, reasoning_parser=‘’, reasoning_parser_plugin=‘’, enable_in_reasoning=False), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None, kv_cache_metrics=False, kv_cache_metrics_sample=0.01, cudagraph_metrics=False, enable_layerwise_nvtx_tracing=False, enable_mfu_metrics=False, enable_mm_processor_stats=False, enable_logging_iteration_details=False), seed=0, served_model_name=/model, enable_prefix_caching=True, enable_chunked_prefill=True, pooler_config=None, compilation_config={‘mode’: <CompilationMode.VLLM_COMPILE: 3>, ‘debug_dump_path’: None, ‘cache_dir’: ‘’, ‘compile_cache_save_format’: ‘binary’, ‘backend’: ‘inductor’, ‘custom_ops’: [‘+sparse_attn_indexer’, ‘none’], ‘ir_enable_torch_wrap’: True, ‘splitting_ops’: [‘vllm::unified_attention_with_output’, ‘vllm::unified_mla_attention_with_output’, ‘vllm::mamba_mixer2’, ‘vllm::mamba_mixer’, ‘vllm::short_conv’, ‘vllm::linear_attention’, ‘vllm::plamo2_mamba_mixer’, ‘vllm::gdn_attention_core’, ‘vllm::gdn_attention_core_xpu’, ‘vllm::olmo_hybrid_gdn_full_forward’, ‘vllm::kda_attention’, ‘vllm::sparse_attn_indexer’, ‘vllm::rocm_aiter_sparse_attn_indexer’, ‘vllm::deepseek_v4_attention’, ‘vllm::unified_kv_cache_update’, ‘vllm::unified_mla_kv_cache_update’], ‘compile_mm_encoder’: False, ‘cudagraph_mm_encoder’: False, ‘encoder_cudagraph_token_budgets’: , ‘encoder_cudagraph_max_vision_items_per_batch’: 0, ‘encoder_cudagraph_max_frames_per_batch’: None, ‘compile_sizes’: , ‘compile_ranges_endpoints’: [2048], ‘inductor_compile_config’: {‘enable_auto_functionalized_v2’: False, ‘size_asserts’: False, ‘alignment_asserts’: False, ‘scalar_asserts’: False, ‘combo_kernels’: True, ‘benchmark_combo_kernel’: True}, ‘inductor_passes’: {}, ‘cudagraph_mode’: <CUDAGraphMode.FULL_AND_PIECEWISE: (2, 1)>, ‘cudagraph_num_of_warmups’: 1, ‘cudagraph_capture_sizes’: [1, 2], ‘cudagraph_copy_inputs’: False, ‘cudagraph_specialize_lora’: True, ‘use_inductor_graph_partition’: False, ‘pass_config’: {‘fuse_norm_quant’: False, ‘fuse_act_quant’: False, ‘fuse_attn_quant’: False, ‘enable_sp’: False, ‘fuse_gemm_comms’: False, ‘fuse_allreduce_rms’: False}, ‘max_cudagraph_capture_size’: 2, ‘dynamic_shapes_config’: {‘type’: <DynamicShapesType.BACKED: ‘backed’>, ‘evaluate_guards’: False, ‘assume_32_bit_indexing’: False}, ‘local_cache_dir’: None, ‘fast_moe_cold_start’: True, ‘static_all_moe_layers’: }, kernel_config=KernelConfig(ir_op_priority=IrOpPriorityConfig(rms_norm=[‘native’]), enable_flashinfer_autotune=True, moe_backend=‘auto’)
(EngineCore pid=407) WARNING 05-03 09:59:03 [nixl_utils.py:34] NIXL is not available
(EngineCore pid=407) WARNING 05-03 09:59:03 [nixl_utils.py:44] NIXL agent config is not available
(EngineCore pid=407) INFO 05-03 09:59:04 [parallel_state.py:1402] world_size=1 rank=0 local_rank=0 distributed_init_method=tcp://10.88.0.5:46367 backend=nccl
(EngineCore pid=407) INFO 05-03 09:59:04 [parallel_state.py:1715] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, PCP rank 0, TP rank 0, EP rank N/A, EPLB rank N/A
(APIServer pid=8) Traceback (most recent call last):
(APIServer pid=8) File “/usr/local/bin/vllm”, line 10, in
(APIServer pid=8) sys.exit(main())
(APIServer pid=8) ^^^^^^
(APIServer pid=8) File “/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/main.py”, line 92, in main
(APIServer pid=8) args.dispatch_function(args)
(APIServer pid=8) File “/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/serve.py”, line 122, in cmd
(APIServer pid=8) uvloop.run(run_server(args))
(APIServer pid=8) File “/usr/local/lib/python3.12/dist-packages/uvloop/init.py”, line 96, in run
(APIServer pid=8) return __asyncio.run(
(APIServer pid=8) ^^^^^^^^^^^^^^
(APIServer pid=8) File “/usr/lib/python3.12/asyncio/runners.py”, line 195, in run
(APIServer pid=8) return runner.run(main)
(APIServer pid=8) ^^^^^^^^^^^^^^^^
(APIServer pid=8) File “/usr/lib/python3.12/asyncio/runners.py”, line 118, in run
(APIServer pid=8) return self._loop.run_until_complete(task)
(APIServer pid=8) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=8) File “uvloop/loop.pyx”, line 1518, in uvloop.loop.Loop.run_until_complete
(APIServer pid=8) File “/usr/local/lib/python3.12/dist-packages/uvloop/init.py”, line 48, in wrapper
(APIServer pid=8) return await main
(APIServer pid=8) ^^^^^^^^^^
(APIServer pid=8) File “/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py”, line 678, in run_server
(APIServer pid=8) await run_server_worker(listen_address, sock, args, **uvicorn_kwargs)
(APIServer pid=8) File “/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py”, line 692, in run_server_worker
(APIServer pid=8) async with build_async_engine_client(
(APIServer pid=8) ^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=8) File “/usr/lib/python3.12/contextlib.py”, line 210, in aenter
(APIServer pid=8) return await anext(self.gen)
(APIServer pid=8) ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=8) File “/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py”, line 100, in build_async_engine_client
(APIServer pid=8) async with build_async_engine_client_from_engine_args(
(APIServer pid=8) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=8) File “/usr/lib/python3.12/contextlib.py”, line 210, in aenter
(APIServer pid=8) return await anext(self.gen)
(APIServer pid=8) ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=8) File “/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py”, line 136, in build_async_engine_client_from_engine_args
(APIServer pid=8) async_llm = AsyncLLM.from_vllm_config(
(APIServer pid=8) ^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=8) File “/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py”, line 217, in from_vllm_config
(APIServer pid=8) return cls(
(APIServer pid=8) ^^^^
(APIServer pid=8) File “/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py”, line 146, in init
(APIServer pid=8) self.engine_core = EngineCoreClient.make_async_mp_client(
(APIServer pid=8) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=8) File “/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py”, line 178, in sync_wrapper
(APIServer pid=8) return func(*args, **kwargs)
(APIServer pid=8) ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=8) File “/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py”, line 130, in make_async_mp_client
(APIServer pid=8) return AsyncMPClient(*client_args)
(APIServer pid=8) ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=8) File “/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py”, line 178, in sync_wrapper
(APIServer pid=8) return func(*args, **kwargs)
(APIServer pid=8) ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=8) File “/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py”, line 900, in init
(APIServer pid=8) super().init(
(APIServer pid=8) File “/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py”, line 535, in init
(APIServer pid=8) with launch_core_engines(
(APIServer pid=8) ^^^^^^^^^^^^^^^^^^^^
(APIServer pid=8) File “/usr/lib/python3.12/contextlib.py”, line 144, in exit
(APIServer pid=8) next(self.gen)
(APIServer pid=8) File “/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/utils.py”, line 1119, in launch_core_engines
(APIServer pid=8) wait_for_engine_startup(
(APIServer pid=8) File “/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/utils.py”, line 1178, in wait_for_engine_startup
(APIServer pid=8) raise RuntimeError(
(APIServer pid=8) RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}
root@e9640e87ad42:/app#

I see there are some warnings about nixl not being available - do I need to install something like rixl to make vllm work?

RunLLM · May 3, 2026, 10:06am

The error RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {} is a generic vLLM startup failure. The warnings about “NIXL is not available” are not the cause of the crash—NIXL is only required for advanced multi-node or disaggregated setups, not for basic single-node inference. You do not need to install NIXL or RIXL for standard vLLM usage on a single node or GPU (source, source).

The actual root cause is not shown in your logs. This type of error on ROCm platforms (especially with AMD GPUs like V620) is often due to ROCm version or driver incompatibility, or unsupported GPU architectures. There are known issues with ROCm 7.x and certain AMD GPUs, and vLLM may not support all ROCm/AMD combinations out of the box (source, source). Please check the full logs for any segfaults or errors from ROCm, and verify that your GPU and ROCm version are supported by vLLM.

Would you like more detail on how to debug ROCm/vLLM compatibility or how to get more informative error logs?

Sources:

Hint: Mention @RunLLM in the post for followups.

diarrheavomit · May 3, 2026, 10:18am

@RunLLM

I’ve seen [Bug]: Abnormally bad performance on AMD ROCM gfx1030 (W6800, V620, 6900XT 6800XT) · Issue #38107 · vllm-project/vllm · GitHub run vllm recently using a v620, so it should be possible right?

Does vllm require specifically 7.2.0 (like the one in the post)? I just checked inside the container, and it’s running 7.2.1.70201-81~22.04 amd64 (7.2.2 is the version outside of the container).

Rocm-smi’s wording:
Device Node IDs Temp Power Partitions SCLK MCLK Fan Perf PwrCap VRAM% GPU%
0 4 0x73a1, 55563 33.0°C 6.0W N/A, N/A, 0 0Mhz 96Mhz 0% auto 250.0W 0% 0%
1 3 0x73a1, 10554 32.0°C 7.0W N/A, N/A, 0 0Mhz 96Mhz 0% auto 250.0W 0% 0%
2 2 0x73a1, 31082 32.0°C 6.0W N/A, N/A, 0 0Mhz 96Mhz 0% auto 250.0W 0% 0%
3 1 0x73a1, 55960 34.0°C 6.0W N/A, N/A, 0 0Mhz 96Mhz 0% auto 250.0W 0% 0%
Uses node to describe a gpu - but they’re all on the same single socket computer.

How do I get more information on what’s causing the crash?

RunLLM · May 3, 2026, 10:19am

Yes, vLLM can run on AMD V620 with ROCm 7.2.x, as shown in issue #38107. vLLM does not require exactly ROCm 7.2.0; 7.2.1 and 7.2.2 are also supported, but compatibility can depend on the specific GPU and PyTorch build (source). The “node” in rocm-smi output just refers to the GPU index, not NUMA nodes.

To get more information about the crash, check the full logs above the “Engine core initialization failed” message for any errors or segfaults. Also, run rocminfo | grep gfx inside the container to confirm your GPU architecture, and try running a simple PyTorch script to verify ROCm is working. If you see a segfault or “No HIP GPUs are available,” it may be a permissions or driver mismatch issue (source).

Would you like step-by-step instructions for debugging ROCm/vLLM startup issues?

Sources:

Hint: Mention @RunLLM in the post for followups.

diarrheavomit · May 3, 2026, 2:13pm

@RunLLM

python -c “import torch; print(‘ROCm:’, torch.cuda.is_available()); print(‘GPU:’, torch.cuda.get_device_name(0)); print(‘Test:’, torch.ones(1).cuda())”

ROCm: True
GPU: AMD Radeon Pro V620
Segmentation fault

Pytorch itself just doesn’t work. But rocm llama-cpp works fine.

RunLLM · May 3, 2026, 2:13pm

Your output shows that PyTorch with ROCm detects the GPU but segfaults on .cuda(), indicating a PyTorch/ROCm compatibility or installation issue. This is a known problem if the PyTorch ROCm build, ROCm runtime, or driver versions are mismatched, or if the GPU architecture is not fully supported by the installed ROCm/PyTorch combination (source, source).

llama.cpp uses its own ROCm backend and does not depend on PyTorch, so it can work even if PyTorch+ROCm is broken. To resolve this, ensure your PyTorch ROCm wheel matches your ROCm runtime version and GPU architecture. Try reinstalling PyTorch using the official ROCm wheels for your ROCm version (instructions).

Would you like detailed steps for reinstalling PyTorch ROCm or checking version compatibility?

Sources:

Hint: Mention @RunLLM in the post for followups.

diarrheavomit · May 3, 2026, 2:14pm

@RunLLM

Shouldn’t that not be a problem with docker images?

RunLLM · May 3, 2026, 2:15pm

It can still be a problem with Docker images if the PyTorch ROCm wheel inside the container does not exactly match the ROCm runtime and driver version on the host, or if the container is missing required device mounts or permissions (source, source). Even with official images, version mismatches or missing /dev/kfd and /dev/dri mounts can cause segfaults.

Would you like a checklist for running ROCm containers with vLLM?

Sources:

https://github.com/vllm-project/vllm/issues/37151

Hint: Mention @RunLLM in the post for followups.

Topic		Replies	Views
Updated rocm/vllm latest and not anymore working General	1	347	November 4, 2025
RTX PRO 6000 users seek help, LLAMA 4 NVFP4 NVIDIA GPU Support	1	281	November 25, 2025
Why latest rocm vllm is so bad? General	3	301	December 14, 2025
How to setup amd gpu as default in dual stack gpu? AMD GPU Support	10	716	April 21, 2025
求救各位大佬看看是什么问题。cuda12.9，pytorch2.8，vllm0.11.0 General	3	218	November 14, 2025

Is the reason for my vllm 0.20.0 failing to start because of nixl?

Related topics