Using the vllm/vllm-openai-rocm:latest docker image on debian 13 with rocm 7.2.2 (inside the container it’s 7.2.1.70201-81~22.04) and an amd v620.
This is what happens when I try to start vllm:
root@e9640e87ad42:/app# vllm serve \
--model /model \
–dtype float16 \
–max-model-len 32768 \
–max-num-seqs 1
WARNING 05-03 09:58:41 [argparse_utils.py:257] With vllm serve, you should provide the model as a positional argument or in a config file instead of via the --model option. The --model option will be removed in a future version.
(APIServer pid=8) INFO 05-03 09:58:41 [utils.py:299]
(APIServer pid=8) INFO 05-03 09:58:41 [utils.py:299] █ █ █▄ ▄█
(APIServer pid=8) INFO 05-03 09:58:41 [utils.py:299] ▄▄ ▄█ █ █ █ ▀▄▀ █ version 0.20.0
(APIServer pid=8) INFO 05-03 09:58:41 [utils.py:299] █▄█▀ █ █ █ █ model /model
(APIServer pid=8) INFO 05-03 09:58:41 [utils.py:299] ▀▀ ▀▀▀▀▀ ▀▀▀▀▀ ▀ ▀
(APIServer pid=8) INFO 05-03 09:58:41 [utils.py:299]
(APIServer pid=8) INFO 05-03 09:58:41 [utils.py:233] non-default args: {‘model_tag’: ‘/model’, ‘model’: ‘/model’, ‘dtype’: ‘float16’, ‘max_model_len’: 32768, ‘max_num_seqs’: 1}
(APIServer pid=8) INFO 05-03 09:58:54 [nixl_utils.py:20] Setting UCX_RCACHE_MAX_UNRELEASED to ‘1024’ to avoid a rare memory leak in UCX when using NIXL.
(APIServer pid=8) WARNING 05-03 09:58:54 [nixl_utils.py:34] NIXL is not available
(APIServer pid=8) WARNING 05-03 09:58:54 [nixl_utils.py:44] NIXL agent config is not available
(APIServer pid=8) INFO 05-03 09:58:54 [model.py:555] Resolved architecture: Qwen3ForCausalLM
(APIServer pid=8) WARNING 05-03 09:58:54 [model.py:2018] Casting torch.bfloat16 to torch.float16.
(APIServer pid=8) INFO 05-03 09:58:54 [model.py:1680] Using max model len 32768
(APIServer pid=8) INFO 05-03 09:58:54 [vllm.py:840] Asynchronous scheduling is enabled.
(APIServer pid=8) INFO 05-03 09:58:54 [kernel.py:205] Final IR op priority after setting platform defaults: IrOpPriorityConfig(rms_norm=[‘native’])
(EngineCore pid=407) INFO 05-03 09:59:02 [core.py:109] Initializing a V1 LLM engine (v0.20.0) with config: model=‘/model’, speculative_config=None, tokenizer=‘/model’, skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.float16, max_seq_len=32768, download_dir=None, load_format=auto, tensor_parallel_size=1, pipeline_parallel_size=1, data_parallel_size=1, decode_context_parallel_size=1, dcp_comm_backend=ag_rs, disable_custom_all_reduce=True, quantization=None, quantization_config=None, enforce_eager=False, enable_return_routed_experts=False, kv_cache_dtype=auto, device_config=cuda, structured_outputs_config=StructuredOutputsConfig(backend=‘auto’, disable_any_whitespace=False, disable_additional_properties=False, reasoning_parser=‘’, reasoning_parser_plugin=‘’, enable_in_reasoning=False), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None, kv_cache_metrics=False, kv_cache_metrics_sample=0.01, cudagraph_metrics=False, enable_layerwise_nvtx_tracing=False, enable_mfu_metrics=False, enable_mm_processor_stats=False, enable_logging_iteration_details=False), seed=0, served_model_name=/model, enable_prefix_caching=True, enable_chunked_prefill=True, pooler_config=None, compilation_config={‘mode’: <CompilationMode.VLLM_COMPILE: 3>, ‘debug_dump_path’: None, ‘cache_dir’: ‘’, ‘compile_cache_save_format’: ‘binary’, ‘backend’: ‘inductor’, ‘custom_ops’: [‘+sparse_attn_indexer’, ‘none’], ‘ir_enable_torch_wrap’: True, ‘splitting_ops’: [‘vllm::unified_attention_with_output’, ‘vllm::unified_mla_attention_with_output’, ‘vllm::mamba_mixer2’, ‘vllm::mamba_mixer’, ‘vllm::short_conv’, ‘vllm::linear_attention’, ‘vllm::plamo2_mamba_mixer’, ‘vllm::gdn_attention_core’, ‘vllm::gdn_attention_core_xpu’, ‘vllm::olmo_hybrid_gdn_full_forward’, ‘vllm::kda_attention’, ‘vllm::sparse_attn_indexer’, ‘vllm::rocm_aiter_sparse_attn_indexer’, ‘vllm::deepseek_v4_attention’, ‘vllm::unified_kv_cache_update’, ‘vllm::unified_mla_kv_cache_update’], ‘compile_mm_encoder’: False, ‘cudagraph_mm_encoder’: False, ‘encoder_cudagraph_token_budgets’: , ‘encoder_cudagraph_max_vision_items_per_batch’: 0, ‘encoder_cudagraph_max_frames_per_batch’: None, ‘compile_sizes’: , ‘compile_ranges_endpoints’: [2048], ‘inductor_compile_config’: {‘enable_auto_functionalized_v2’: False, ‘size_asserts’: False, ‘alignment_asserts’: False, ‘scalar_asserts’: False, ‘combo_kernels’: True, ‘benchmark_combo_kernel’: True}, ‘inductor_passes’: {}, ‘cudagraph_mode’: <CUDAGraphMode.FULL_AND_PIECEWISE: (2, 1)>, ‘cudagraph_num_of_warmups’: 1, ‘cudagraph_capture_sizes’: [1, 2], ‘cudagraph_copy_inputs’: False, ‘cudagraph_specialize_lora’: True, ‘use_inductor_graph_partition’: False, ‘pass_config’: {‘fuse_norm_quant’: False, ‘fuse_act_quant’: False, ‘fuse_attn_quant’: False, ‘enable_sp’: False, ‘fuse_gemm_comms’: False, ‘fuse_allreduce_rms’: False}, ‘max_cudagraph_capture_size’: 2, ‘dynamic_shapes_config’: {‘type’: <DynamicShapesType.BACKED: ‘backed’>, ‘evaluate_guards’: False, ‘assume_32_bit_indexing’: False}, ‘local_cache_dir’: None, ‘fast_moe_cold_start’: True, ‘static_all_moe_layers’: }, kernel_config=KernelConfig(ir_op_priority=IrOpPriorityConfig(rms_norm=[‘native’]), enable_flashinfer_autotune=True, moe_backend=‘auto’)
(EngineCore pid=407) WARNING 05-03 09:59:03 [nixl_utils.py:34] NIXL is not available
(EngineCore pid=407) WARNING 05-03 09:59:03 [nixl_utils.py:44] NIXL agent config is not available
(EngineCore pid=407) INFO 05-03 09:59:04 [parallel_state.py:1402] world_size=1 rank=0 local_rank=0 distributed_init_method=tcp://10.88.0.5:46367 backend=nccl
(EngineCore pid=407) INFO 05-03 09:59:04 [parallel_state.py:1715] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, PCP rank 0, TP rank 0, EP rank N/A, EPLB rank N/A
(APIServer pid=8) Traceback (most recent call last):
(APIServer pid=8) File “/usr/local/bin/vllm”, line 10, in
(APIServer pid=8) sys.exit(main())
(APIServer pid=8) ^^^^^^
(APIServer pid=8) File “/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/main.py”, line 92, in main
(APIServer pid=8) args.dispatch_function(args)
(APIServer pid=8) File “/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/serve.py”, line 122, in cmd
(APIServer pid=8) uvloop.run(run_server(args))
(APIServer pid=8) File “/usr/local/lib/python3.12/dist-packages/uvloop/init.py”, line 96, in run
(APIServer pid=8) return __asyncio.run(
(APIServer pid=8) ^^^^^^^^^^^^^^
(APIServer pid=8) File “/usr/lib/python3.12/asyncio/runners.py”, line 195, in run
(APIServer pid=8) return runner.run(main)
(APIServer pid=8) ^^^^^^^^^^^^^^^^
(APIServer pid=8) File “/usr/lib/python3.12/asyncio/runners.py”, line 118, in run
(APIServer pid=8) return self._loop.run_until_complete(task)
(APIServer pid=8) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=8) File “uvloop/loop.pyx”, line 1518, in uvloop.loop.Loop.run_until_complete
(APIServer pid=8) File “/usr/local/lib/python3.12/dist-packages/uvloop/init.py”, line 48, in wrapper
(APIServer pid=8) return await main
(APIServer pid=8) ^^^^^^^^^^
(APIServer pid=8) File “/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py”, line 678, in run_server
(APIServer pid=8) await run_server_worker(listen_address, sock, args, **uvicorn_kwargs)
(APIServer pid=8) File “/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py”, line 692, in run_server_worker
(APIServer pid=8) async with build_async_engine_client(
(APIServer pid=8) ^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=8) File “/usr/lib/python3.12/contextlib.py”, line 210, in aenter
(APIServer pid=8) return await anext(self.gen)
(APIServer pid=8) ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=8) File “/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py”, line 100, in build_async_engine_client
(APIServer pid=8) async with build_async_engine_client_from_engine_args(
(APIServer pid=8) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=8) File “/usr/lib/python3.12/contextlib.py”, line 210, in aenter
(APIServer pid=8) return await anext(self.gen)
(APIServer pid=8) ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=8) File “/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py”, line 136, in build_async_engine_client_from_engine_args
(APIServer pid=8) async_llm = AsyncLLM.from_vllm_config(
(APIServer pid=8) ^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=8) File “/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py”, line 217, in from_vllm_config
(APIServer pid=8) return cls(
(APIServer pid=8) ^^^^
(APIServer pid=8) File “/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py”, line 146, in init
(APIServer pid=8) self.engine_core = EngineCoreClient.make_async_mp_client(
(APIServer pid=8) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=8) File “/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py”, line 178, in sync_wrapper
(APIServer pid=8) return func(*args, **kwargs)
(APIServer pid=8) ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=8) File “/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py”, line 130, in make_async_mp_client
(APIServer pid=8) return AsyncMPClient(*client_args)
(APIServer pid=8) ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=8) File “/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py”, line 178, in sync_wrapper
(APIServer pid=8) return func(*args, **kwargs)
(APIServer pid=8) ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=8) File “/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py”, line 900, in init
(APIServer pid=8) super().init(
(APIServer pid=8) File “/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py”, line 535, in init
(APIServer pid=8) with launch_core_engines(
(APIServer pid=8) ^^^^^^^^^^^^^^^^^^^^
(APIServer pid=8) File “/usr/lib/python3.12/contextlib.py”, line 144, in exit
(APIServer pid=8) next(self.gen)
(APIServer pid=8) File “/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/utils.py”, line 1119, in launch_core_engines
(APIServer pid=8) wait_for_engine_startup(
(APIServer pid=8) File “/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/utils.py”, line 1178, in wait_for_engine_startup
(APIServer pid=8) raise RuntimeError(
(APIServer pid=8) RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}
root@e9640e87ad42:/app#
I see there are some warnings about nixl not being available - do I need to install something like rixl to make vllm work?