setting the env variable in my OS instead of the docker contianer gave me this [ The behavior was different when launchin the model]:
INFO 04-21 18:53:38 [api_server.py:209] Started engine process with PID 44
config.json: 100%|█████████████████████████████████████████████████████| 679/679 [00:00<00:00, 8.48MB/s]
INFO 04-21 18:53:40 [__init__.py:256] Automatically detected platform rocm.
INFO 04-21 18:53:49 [config.py:578] This model supports multiple tasks: {'generate', 'classify', 'embed', 'score', 'reward'}. Defaulting to 'generate'.
INFO 04-21 18:53:50 [config.py:578] This model supports multiple tasks: {'classify', 'generate', 'reward', 'embed', 'score'}. Defaulting to 'generate'.
INFO 04-21 18:53:51 [config.py:1508] Disabled the custom all-reduce kernel because it is not working correctly when using two AMD Navi GPUs.
INFO 04-21 18:53:51 [config.py:1520] Disabled the custom all-reduce kernel because it is not working correctly when using two AMD Navi GPUs.
WARNING 04-21 18:53:51 [arg_utils.py:1282] The model has a long context length (131072). This may cause OOM errors during the initial memory profiling phase, or result in low performance due to small KV cache space. Consider setting --max-model-len to a smaller value.
INFO 04-21 18:53:51 [rocm.py:228] Aiter main switch (VLLM_USE_AITER) is not set. Disabling individual Aiter components
tokenizer_config.json: 100%|███████████████████████████████████████| 3.07k/3.07k [00:00<00:00, 40.6MB/s]
INFO 04-21 18:53:52 [config.py:1508] Disabled the custom all-reduce kernel because it is not working correctly when using two AMD Navi GPUs.
INFO 04-21 18:53:52 [config.py:1520] Disabled the custom all-reduce kernel because it is not working correctly when using two AMD Navi GPUs.
WARNING 04-21 18:53:52 [arg_utils.py:1282] The model has a long context length (131072). This may cause OOM errors during the initial memory profiling phase, or result in low performance due to small KV cache space. Consider setting --max-model-len to a smaller value.
INFO 04-21 18:53:52 [rocm.py:228] Aiter main switch (VLLM_USE_AITER) is not set. Disabling individual Aiter components
INFO 04-21 18:53:52 [engine.py:77] Initializing a V0 LLM engine (v0.7.4.dev388+g51641aaa7) with config: model='deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B', speculative_config=None, tokenizer='deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=131072, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=True, quantization=None, enforce_eager=False, kv_cache_dtype=auto, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='xgrammar', reasoning_backend=None), observability_config=ObservabilityConfig(show_hidden_metrics=False, otlp_traces_endpoint=None, collect_model_forward_time=False, collect_model_execute_time=False), seed=None, served_model_name=deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=False, chunked_prefill_enabled=False, use_async_output_proc=True, disable_mm_preprocessor_cache=False, mm_processor_kwargs=None, pooler_config=None, compilation_config={"splitting_ops":[],"compile_sizes":[],"cudagraph_capture_sizes":[256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"max_capture_size":256}, use_cached_outputs=True,
tokenizer.json: 100%|██████████████████████████████████████████████| 7.03M/7.03M [00:01<00:00, 4.40MB/s]
generation_config.json: 100%|███████████████████████████████████████████| 181/181 [00:00<00:00, 615kB/s]
INFO 04-21 18:53:56 [rocm.py:133] None is not supported in AMD GPUs.
INFO 04-21 18:53:56 [rocm.py:134] Using ROCmFlashAttention backend.
[rank0]:[W421 18:53:56.974263691 ProcessGroupGloo.cpp:727] Warning: Unable to resolve hostname to a (local) address. Using the loopback address as fallback. Manually set the network interface to bind to with GLOO_SOCKET_IFNAME. (function operator())
INFO 04-21 18:53:57 [parallel_state.py:948] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, TP rank 0
INFO 04-21 18:53:57 [model_runner.py:1115] Starting to load model deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B...
WARNING 04-21 18:53:57 [rocm.py:239] Model architecture 'Qwen2ForCausalLM' is partially supported by ROCm: Sliding window attention (SWA) is not yet supported in Triton flash attention. For half-precision SWA support, please use CK flash attention by setting `VLLM_USE_TRITON_FLASH_ATTN=0`
ERROR 04-21 18:53:57 [engine.py:411] HIP error: invalid device function
ERROR 04-21 18:53:57 [engine.py:411] HIP kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
ERROR 04-21 18:53:57 [engine.py:411] For debugging consider passing AMD_SERIALIZE_KERNEL=3
ERROR 04-21 18:53:57 [engine.py:411] Compile with `TORCH_USE_HIP_DSA` to enable device-side assertions.
ERROR 04-21 18:53:57 [engine.py:411] Traceback (most recent call last):
ERROR 04-21 18:53:57 [engine.py:411] File "/usr/local/lib/python3.12/dist-packages/vllm/engine/multiprocessing/engine.py", line 402, in run_mp_engine
ERROR 04-21 18:53:57 [engine.py:411] engine = MQLLMEngine.from_engine_args(engine_args=engine_args,
ERROR 04-21 18:53:57 [engine.py:411] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-21 18:53:57 [engine.py:411] File "/usr/local/lib/python3.12/dist-packages/vllm/engine/multiprocessing/engine.py", line 125, in from_engine_args
ERROR 04-21 18:53:57 [engine.py:411] return cls(ipc_path=ipc_path,
ERROR 04-21 18:53:57 [engine.py:411] ^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-21 18:53:57 [engine.py:411] File "/usr/local/lib/python3.12/dist-packages/vllm/engine/multiprocessing/engine.py", line 77, in __init__
ERROR 04-21 18:53:57 [engine.py:411] self.engine = LLMEngine(*args, **kwargs)
ERROR 04-21 18:53:57 [engine.py:411] ^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-21 18:53:57 [engine.py:411] File "vllm/engine/llm_engine.py", line 274, in vllm.engine.llm_engine.LLMEngine.__init__
ERROR 04-21 18:53:57 [engine.py:411] File "/usr/local/lib/python3.12/dist-packages/vllm/executor/executor_base.py", line 52, in __init__
ERROR 04-21 18:53:57 [engine.py:411] self._init_executor()
ERROR 04-21 18:53:57 [engine.py:411] File "/usr/local/lib/python3.12/dist-packages/vllm/executor/uniproc_executor.py", line 47, in _init_executor
ERROR 04-21 18:53:57 [engine.py:411] self.collective_rpc("load_model")
ERROR 04-21 18:53:57 [engine.py:411] File "/usr/local/lib/python3.12/dist-packages/vllm/executor/uniproc_executor.py", line 56, in collective_rpc
ERROR 04-21 18:53:57 [engine.py:411] answer = run_method(self.driver_worker, method, args, kwargs)
ERROR 04-21 18:53:57 [engine.py:411] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-21 18:53:57 [engine.py:411] File "/usr/local/lib/python3.12/dist-packages/vllm/utils.py", line 2444, in run_method
ERROR 04-21 18:53:57 [engine.py:411] return func(*args, **kwargs)
ERROR 04-21 18:53:57 [engine.py:411] ^^^^^^^^^^^^^^^^^^^^^
ERROR 04-21 18:53:57 [engine.py:411] File "/usr/local/lib/python3.12/dist-packages/vllm/worker/worker.py", line 211, in load_model
ERROR 04-21 18:53:57 [engine.py:411] self.model_runner.load_model()
ERROR 04-21 18:53:57 [engine.py:411] File "/usr/local/lib/python3.12/dist-packages/vllm/worker/model_runner.py", line 1118, in load_model
ERROR 04-21 18:53:57 [engine.py:411] self.model = get_model(vllm_config=self.vllm_config)
ERROR 04-21 18:53:57 [engine.py:411] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-21 18:53:57 [engine.py:411] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/__init__.py", line 14, in get_model
ERROR 04-21 18:53:57 [engine.py:411] return loader.load_model(vllm_config=vllm_config)
ERROR 04-21 18:53:57 [engine.py:411] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-21 18:53:57 [engine.py:411] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/loader.py", line 423, in load_model
ERROR 04-21 18:53:57 [engine.py:411] model = _initialize_model(vllm_config=vllm_config)
ERROR 04-21 18:53:57 [engine.py:411] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-21 18:53:57 [engine.py:411] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/loader.py", line 126, in _initialize_model
ERROR 04-21 18:53:57 [engine.py:411] return model_class(vllm_config=vllm_config, prefix=prefix)
ERROR 04-21 18:53:57 [engine.py:411] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-21 18:53:57 [engine.py:411] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen2.py", line 431, in __init__
ERROR 04-21 18:53:57 [engine.py:411] self.model = Qwen2Model(vllm_config=vllm_config,
ERROR 04-21 18:53:57 [engine.py:411] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-21 18:53:57 [engine.py:411] File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/decorators.py", line 151, in __init__
ERROR 04-21 18:53:57 [engine.py:411] old_init(self, vllm_config=vllm_config, prefix=prefix, **kwargs)
ERROR 04-21 18:53:57 [engine.py:411] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen2.py", line 300, in __init__
ERROR 04-21 18:53:57 [engine.py:411] self.start_layer, self.end_layer, self.layers = make_layers(
ERROR 04-21 18:53:57 [engine.py:411] ^^^^^^^^^^^^
ERROR 04-21 18:53:57 [engine.py:411] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 558, in make_layers
ERROR 04-21 18:53:57 [engine.py:411] maybe_offload_to_cpu(layer_fn(prefix=f"{prefix}.{idx}"))
ERROR 04-21 18:53:57 [engine.py:411] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-21 18:53:57 [engine.py:411] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen2.py", line 302, in <lambda>
ERROR 04-21 18:53:57 [engine.py:411] lambda prefix: Qwen2DecoderLayer(config=config,
ERROR 04-21 18:53:57 [engine.py:411] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-21 18:53:57 [engine.py:411] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen2.py", line 206, in __init__
ERROR 04-21 18:53:57 [engine.py:411] self.self_attn = Qwen2Attention(
ERROR 04-21 18:53:57 [engine.py:411] ^^^^^^^^^^^^^^^
ERROR 04-21 18:53:57 [engine.py:411] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen2.py", line 153, in __init__
ERROR 04-21 18:53:57 [engine.py:411] self.rotary_emb = get_rope(
ERROR 04-21 18:53:57 [engine.py:411] ^^^^^^^^^
ERROR 04-21 18:53:57 [engine.py:411] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/rotary_embedding.py", line 1111, in get_rope
ERROR 04-21 18:53:57 [engine.py:411] rotary_emb = RotaryEmbedding(head_size, rotary_dim, max_position, base,
ERROR 04-21 18:53:57 [engine.py:411] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-21 18:53:57 [engine.py:411] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/rotary_embedding.py", line 99, in __init__
ERROR 04-21 18:53:57 [engine.py:411] cache = self._compute_cos_sin_cache()
ERROR 04-21 18:53:57 [engine.py:411] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-21 18:53:57 [engine.py:411] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/rotary_embedding.py", line 116, in _compute_cos_sin_cache
ERROR 04-21 18:53:57 [engine.py:411] inv_freq = self._compute_inv_freq(self.base)
ERROR 04-21 18:53:57 [engine.py:411] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-21 18:53:57 [engine.py:411] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/rotary_embedding.py", line 110, in _compute_inv_freq
ERROR 04-21 18:53:57 [engine.py:411] inv_freq = 1.0 / (base**(torch.arange(
ERROR 04-21 18:53:57 [engine.py:411] ^^^^^^^^^^^^^
ERROR 04-21 18:53:57 [engine.py:411] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_device.py", line 104, in __torch_function__
ERROR 04-21 18:53:57 [engine.py:411] return func(*args, **kwargs)
ERROR 04-21 18:53:57 [engine.py:411] ^^^^^^^^^^^^^^^^^^^^^
ERROR 04-21 18:53:57 [engine.py:411] RuntimeError: HIP error: invalid device function
ERROR 04-21 18:53:57 [engine.py:411] HIP kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
ERROR 04-21 18:53:57 [engine.py:411] For debugging consider passing AMD_SERIALIZE_KERNEL=3
ERROR 04-21 18:53:57 [engine.py:411] Compile with `TORCH_USE_HIP_DSA` to enable device-side assertions.
ERROR 04-21 18:53:57 [engine.py:411]
Process SpawnProcess-1:
Traceback (most recent call last):
File "/usr/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/usr/lib/python3.12/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/usr/local/lib/python3.12/dist-packages/vllm/engine/multiprocessing/engine.py", line 413, in run_mp_engine
raise e
File "/usr/local/lib/python3.12/dist-packages/vllm/engine/multiprocessing/engine.py", line 402, in run_mp_engine
engine = MQLLMEngine.from_engine_args(engine_args=engine_args,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/engine/multiprocessing/engine.py", line 125, in from_engine_args
return cls(ipc_path=ipc_path,
^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/engine/multiprocessing/engine.py", line 77, in __init__
self.engine = LLMEngine(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "vllm/engine/llm_engine.py", line 274, in vllm.engine.llm_engine.LLMEngine.__init__
File "/usr/local/lib/python3.12/dist-packages/vllm/executor/executor_base.py", line 52, in __init__
self._init_executor()
File "/usr/local/lib/python3.12/dist-packages/vllm/executor/uniproc_executor.py", line 47, in _init_executor
self.collective_rpc("load_model")
File "/usr/local/lib/python3.12/dist-packages/vllm/executor/uniproc_executor.py", line 56, in collective_rpc
answer = run_method(self.driver_worker, method, args, kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/utils.py", line 2444, in run_method
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/worker/worker.py", line 211, in load_model
self.model_runner.load_model()
File "/usr/local/lib/python3.12/dist-packages/vllm/worker/model_runner.py", line 1118, in load_model
self.model = get_model(vllm_config=self.vllm_config)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/__init__.py", line 14, in get_model
return loader.load_model(vllm_config=vllm_config)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/loader.py", line 423, in load_model
model = _initialize_model(vllm_config=vllm_config)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/loader.py", line 126, in _initialize_model
return model_class(vllm_config=vllm_config, prefix=prefix)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen2.py", line 431, in __init__
self.model = Qwen2Model(vllm_config=vllm_config,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/decorators.py", line 151, in __init__
old_init(self, vllm_config=vllm_config, prefix=prefix, **kwargs)
File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen2.py", line 300, in __init__
self.start_layer, self.end_layer, self.layers = make_layers(
^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 558, in make_layers
maybe_offload_to_cpu(layer_fn(prefix=f"{prefix}.{idx}"))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen2.py", line 302, in <lambda>
lambda prefix: Qwen2DecoderLayer(config=config,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen2.py", line 206, in __init__
self.self_attn = Qwen2Attention(
^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen2.py", line 153, in __init__
self.rotary_emb = get_rope(
^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/rotary_embedding.py", line 1111, in get_rope
rotary_emb = RotaryEmbedding(head_size, rotary_dim, max_position, base,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/rotary_embedding.py", line 99, in __init__
cache = self._compute_cos_sin_cache()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/rotary_embedding.py", line 116, in _compute_cos_sin_cache
inv_freq = self._compute_inv_freq(self.base)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/rotary_embedding.py", line 110, in _compute_inv_freq
inv_freq = 1.0 / (base**(torch.arange(
^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/torch/utils/_device.py", line 104, in __torch_function__
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
RuntimeError: HIP error: invalid device function
HIP kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing AMD_SERIALIZE_KERNEL=3
Compile with `TORCH_USE_HIP_DSA` to enable device-side assertions.
[rank0]:[W421 18:53:57.714810830 ProcessGroupNCCL.cpp:1505] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
Traceback (most recent call last):
File "/usr/local/bin/vllm", line 8, in <module>
sys.exit(main())
^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/main.py", line 75, in main
args.dispatch_function(args)
File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/serve.py", line 33, in cmd
uvloop.run(run_server(args))
File "/usr/local/lib/python3.12/dist-packages/uvloop/__init__.py", line 109, in run
return __asyncio.run(
^^^^^^^^^^^^^^
File "/usr/lib/python3.12/asyncio/runners.py", line 195, in run
return runner.run(main)
^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run
return self._loop.run_until_complete(task)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
File "/usr/local/lib/python3.12/dist-packages/uvloop/__init__.py", line 61, in wrapper
return await main
^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 947, in run_server
async with build_async_engine_client(args) as engine_client:
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/contextlib.py", line 210, in __aenter__
return await anext(self.gen)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 139, in build_async_engine_client
async with build_async_engine_client_from_engine_args(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/contextlib.py", line 210, in __aenter__
return await anext(self.gen)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 233, in build_async_engine_client_from_engine_args
raise RuntimeError(
RuntimeError: Engine process failed to start. See stack trace for the root cause.
Setting both, the container and the os system with the env variable gave most of the same, but it also failed before than the previous one. The last error in the context of running the env variable in os and container is this:
ValueError: Model architectures ['Qwen2ForCausalLM'] failed to be inspected. Please check the logs for more details.
It seems that ROCm get’s detected, but for some reason it keeps failing to launch the model.
If you know something to troubleshoot this, I’ll appreciate that so much, since I don’t know what to do now.
Other possibility is to launch my amd gpu with ZLUDA, to be able to use cuda, but I’m not sure if that is going to help me in launching vllm.
EDIT: This is suppose to be the extension of my last message.