RTX PRO 6000 users seek help, LLAMA 4 NVFP4

win10ogod · November 25, 2025, 5:11am

docker run --rm -it --gpus all --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 -p 8000:8000 -v E:/text-generation-webui-1.14/user_data/models:/model vllm-blackwell:latest python -m vllm.entrypoints.openai.api_server --model /model/Llama4-Scout17B-NVFP4 --trust-remote-code --host 0.0.0.0 --port 8000 --max-model-len 131072 --gpu-memory-utilization 0.95 --served-model-name Llama4-Scout17B-NVFP4 --chat-template /model/Llama4-Scout17B-NVFP4/chat_template.jinja --kv-cache-dtype fp8 --no-enable-prefix-caching true --async-scheduling --mm-encoder-tp-mode data --enable-auto-tool-choice --tool-call-parser llama4_pythonic --quantization modelopt_fp4

(EngineCore_DP0 pid=221) ERROR 11-25 04:36:24 [core.py:842] EngineCore failed to start.
(EngineCore_DP0 pid=221) ERROR 11-25 04:36:24 [core.py:842] Traceback (most recent call last):
(EngineCore_DP0 pid=221) ERROR 11-25 04:36:24 [core.py:842] File “/workspace/vllm/vllm/v1/engine/core.py”, line 833, in run_engine_core
(EngineCore_DP0 pid=221) ERROR 11-25 04:36:24 [core.py:842] engine_core = EngineCoreProc(*args, **kwargs)
(EngineCore_DP0 pid=221) ERROR 11-25 04:36:24 [core.py:842] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=221) ERROR 11-25 04:36:24 [core.py:842] File “/workspace/vllm/vllm/v1/engine/core.py”, line 609, in _init_
(EngineCore_DP0 pid=221) ERROR 11-25 04:36:24 [core.py:842] super()._init_(
(EngineCore_DP0 pid=221) ERROR 11-25 04:36:24 [core.py:842] File “/workspace/vllm/vllm/v1/engine/core.py”, line 109, in _init_
(EngineCore_DP0 pid=221) ERROR 11-25 04:36:24 [core.py:842] num_gpu_blocks, num_cpu_blocks, kv_cache_config = self._initialize_kv_caches(
(EngineCore_DP0 pid=221) ERROR 11-25 04:36:24 [core.py:842] ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=221) ERROR 11-25 04:36:24 [core.py:842] File “/workspace/vllm/vllm/v1/engine/core.py”, line 234, in _initialize_kv_caches
(EngineCore_DP0 pid=221) ERROR 11-25 04:36:24 [core.py:842] available_gpu_memory = self.model_executor.determine_available_memory()
(EngineCore_DP0 pid=221) ERROR 11-25 04:36:24 [core.py:842] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=221) ERROR 11-25 04:36:24 [core.py:842] File “/workspace/vllm/vllm/v1/executor/abstract.py”, line 126, in determine_available_memory
(EngineCore_DP0 pid=221) ERROR 11-25 04:36:24 [core.py:842] return self.collective_rpc(“determine_available_memory”)
(EngineCore_DP0 pid=221) ERROR 11-25 04:36:24 [core.py:842] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=221) ERROR 11-25 04:36:24 [core.py:842] File “/workspace/vllm/vllm/v1/executor/uniproc_executor.py”, line 75, in collective_rpc
(EngineCore_DP0 pid=221) ERROR 11-25 04:36:24 [core.py:842] result = run_method(self.driver_worker, method, args, kwargs)
(EngineCore_DP0 pid=221) ERROR 11-25 04:36:24 [core.py:842] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=221) ERROR 11-25 04:36:24 [core.py:842] File “/workspace/vllm/vllm/v1/serial_utils.py”, line 479, in run_method
(EngineCore_DP0 pid=221) ERROR 11-25 04:36:24 [core.py:842] return func(*args, **kwargs)
(EngineCore_DP0 pid=221) ERROR 11-25 04:36:24 [core.py:842] ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=221) ERROR 11-25 04:36:24 [core.py:842] File “/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py”, line 116, in decorate_context
(EngineCore_DP0 pid=221) ERROR 11-25 04:36:24 [core.py:842] return func(*args, **kwargs)
(EngineCore_DP0 pid=221) ERROR 11-25 04:36:24 [core.py:842] ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=221) ERROR 11-25 04:36:24 [core.py:842] File “/workspace/vllm/vllm/v1/worker/gpu_worker.py”, line 313, in determine_available_memory
(EngineCore_DP0 pid=221) ERROR 11-25 04:36:24 [core.py:842] self.model_runner.profile_run()
(EngineCore_DP0 pid=221) ERROR 11-25 04:36:24 [core.py:842] File “/workspace/vllm/vllm/v1/worker/gpu_model_runner.py”, line 4145, in profile_run
(EngineCore_DP0 pid=221) ERROR 11-25 04:36:24 [core.py:842] hidden_states, last_hidden_states = self._dummy_run(
(EngineCore_DP0 pid=221) ERROR 11-25 04:36:24 [core.py:842] ^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=221) ERROR 11-25 04:36:24 [core.py:842] File “/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py”, line 116, in decorate_context
(EngineCore_DP0 pid=221) ERROR 11-25 04:36:24 [core.py:842] return func(*args, **kwargs)
(EngineCore_DP0 pid=221) ERROR 11-25 04:36:24 [core.py:842] ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=221) ERROR 11-25 04:36:24 [core.py:842] File “/workspace/vllm/vllm/v1/worker/gpu_model_runner.py”, line 3864, in _dummy_run
(EngineCore_DP0 pid=221) ERROR 11-25 04:36:24 [core.py:842] outputs = self.model(
(EngineCore_DP0 pid=221) ERROR 11-25 04:36:24 [core.py:842] ^^^^^^^^^^^
(EngineCore_DP0 pid=221) ERROR 11-25 04:36:24 [core.py:842] File “/workspace/vllm/vllm/compilation/cuda_graph.py”, line 126, in _call_
(EngineCore_DP0 pid=221) ERROR 11-25 04:36:24 [core.py:842] return self.runnable(*args, **kwargs)
(EngineCore_DP0 pid=221) ERROR 11-25 04:36:24 [core.py:842] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=221) ERROR 11-25 04:36:24 [core.py:842] File “/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py”, line 1751, in _wrapped_call_impl
(EngineCore_DP0 pid=221) ERROR 11-25 04:36:24 [core.py:842] return self._call_impl(*args, **kwargs)
(EngineCore_DP0 pid=221) ERROR 11-25 04:36:24 [core.py:842] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=221) ERROR 11-25 04:36:24 [core.py:842] File “/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py”, line 1762, in _call_impl
(EngineCore_DP0 pid=221) ERROR 11-25 04:36:24 [core.py:842] return forward_call(*args, **kwargs)
(EngineCore_DP0 pid=221) ERROR 11-25 04:36:24 [core.py:842] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=221) ERROR 11-25 04:36:24 [core.py:842] File “/workspace/vllm/vllm/model_executor/models/mllama4.py”, line 901, in forward
(EngineCore_DP0 pid=221) ERROR 11-25 04:36:24 [core.py:842] return self.language_model(
(EngineCore_DP0 pid=221) ERROR 11-25 04:36:24 [core.py:842] ^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=221) ERROR 11-25 04:36:24 [core.py:842] File “/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py”, line 1751, in _wrapped_call_impl
(EngineCore_DP0 pid=221) ERROR 11-25 04:36:24 [core.py:842] return self._call_impl(*args, **kwargs)
(EngineCore_DP0 pid=221) ERROR 11-25 04:36:24 [core.py:842] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=221) ERROR 11-25 04:36:24 [core.py:842] File “/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py”, line 1762, in _call_impl
(EngineCore_DP0 pid=221) ERROR 11-25 04:36:24 [core.py:842] return forward_call(*args, **kwargs)
(EngineCore_DP0 pid=221) ERROR 11-25 04:36:24 [core.py:842] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=221) ERROR 11-25 04:36:24 [core.py:842] File “/workspace/vllm/vllm/model_executor/models/llama.py”, line 617, in forward
(EngineCore_DP0 pid=221) ERROR 11-25 04:36:24 [core.py:842] model_output = self.model(
(EngineCore_DP0 pid=221) ERROR 11-25 04:36:24 [core.py:842] ^^^^^^^^^^^
(EngineCore_DP0 pid=221) ERROR 11-25 04:36:24 [core.py:842] File “/workspace/vllm/vllm/compilation/decorators.py”, line 471, in _call_
(EngineCore_DP0 pid=221) ERROR 11-25 04:36:24 [core.py:842] output = TorchCompileWithNoGuardsWrapper._call_(self, *args, **kwargs)
(EngineCore_DP0 pid=221) ERROR 11-25 04:36:24 [core.py:842] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=221) ERROR 11-25 04:36:24 [core.py:842] File “/workspace/vllm/vllm/compilation/wrapper.py”, line 149, in _call_
(EngineCore_DP0 pid=221) ERROR 11-25 04:36:24 [core.py:842] return self._compiled_callable(*args, **kwargs)
(EngineCore_DP0 pid=221) ERROR 11-25 04:36:24 [core.py:842] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=221) ERROR 11-25 04:36:24 [core.py:842] File “/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py”, line 671, in _fn
(EngineCore_DP0 pid=221) ERROR 11-25 04:36:24 [core.py:842] raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1
(EngineCore_DP0 pid=221) ERROR 11-25 04:36:24 [core.py:842] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=221) ERROR 11-25 04:36:24 [core.py:842] File “/usr/local/lib/python3.12/dist-packages/torch/_dynamo/output_graph.py”, line 1569, in _call_user_compiler
(EngineCore_DP0 pid=221) ERROR 11-25 04:36:24 [core.py:842] raise BackendCompilerFailed(
(EngineCore_DP0 pid=221) ERROR 11-25 04:36:24 [core.py:842] File “/usr/local/lib/python3.12/dist-packages/torch/_dynamo/output_graph.py”, line 1544, in _call_user_compiler
(EngineCore_DP0 pid=221) ERROR 11-25 04:36:24 [core.py:842] compiled_fn = compiler_fn(gm, self.example_inputs())
(EngineCore_DP0 pid=221) ERROR 11-25 04:36:24 [core.py:842] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=221) ERROR 11-25 04:36:24 [core.py:842] File “/usr/local/lib/python3.12/dist-packages/torch/_dynamo/repro/after_dynamo.py”, line 150, in _call_
(EngineCore_DP0 pid=221) ERROR 11-25 04:36:24 [core.py:842] compiled_gm = compiler_fn(gm, example_inputs)
(EngineCore_DP0 pid=221) ERROR 11-25 04:36:24 [core.py:842] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=221) ERROR 11-25 04:36:24 [core.py:842] File “/usr/local/lib/python3.12/dist-packages/torch/_dynamo/repro/after_dynamo.py”, line 150, in _call_
(EngineCore_DP0 pid=221) ERROR 11-25 04:36:24 [core.py:842] compiled_gm = compiler_fn(gm, example_inputs)
(EngineCore_DP0 pid=221) ERROR 11-25 04:36:24 [core.py:842] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=221) ERROR 11-25 04:36:24 [core.py:842] File “/usr/local/lib/python3.12/dist-packages/torch/_init_.py”, line 2400, in _call_
(EngineCore_DP0 pid=221) ERROR 11-25 04:36:24 [core.py:842] return self.compiler_fn(model_, inputs_, **self.kwargs)
(EngineCore_DP0 pid=221) ERROR 11-25 04:36:24 [core.py:842] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=221) ERROR 11-25 04:36:24 [core.py:842] torch._dynamo.exc.BackendCompilerFailed: backend=‘<vllm.compilation.backends.VllmBackend object at 0x7d4e1c77fec0>’ raised:
(EngineCore_DP0 pid=221) ERROR 11-25 04:36:24 [core.py:842] TypeError: VllmBackend._call_() got an unexpected keyword argument ‘options’
(EngineCore_DP0 pid=221) ERROR 11-25 04:36:24 [core.py:842]
(EngineCore_DP0 pid=221) ERROR 11-25 04:36:24 [core.py:842] Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you’re reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS=“+dynamo”
(EngineCore_DP0 pid=221) ERROR 11-25 04:36:24 [core.py:842]
(EngineCore_DP0 pid=221) Process EngineCore_DP0:
(EngineCore_DP0 pid=221) Traceback (most recent call last):
(EngineCore_DP0 pid=221) File “/usr/lib/python3.12/multiprocessing/process.py”, line 314, in _bootstrap
(EngineCore_DP0 pid=221) self.run()
(EngineCore_DP0 pid=221) File “/usr/lib/python3.12/multiprocessing/process.py”, line 108, in run
(EngineCore_DP0 pid=221) self._target(*self._args, **self._kwargs)
(EngineCore_DP0 pid=221) File “/workspace/vllm/vllm/v1/engine/core.py”, line 846, in run_engine_core
(EngineCore_DP0 pid=221) raise e
(EngineCore_DP0 pid=221) File “/workspace/vllm/vllm/v1/engine/core.py”, line 833, in run_engine_core
(EngineCore_DP0 pid=221) engine_core = EngineCoreProc(*args, **kwargs)
(EngineCore_DP0 pid=221) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=221) File “/workspace/vllm/vllm/v1/engine/core.py”, line 609, in _init_
(EngineCore_DP0 pid=221) super()._init_(
(EngineCore_DP0 pid=221) File “/workspace/vllm/vllm/v1/engine/core.py”, line 109, in _init_
(EngineCore_DP0 pid=221) num_gpu_blocks, num_cpu_blocks, kv_cache_config = self._initialize_kv_caches(
(EngineCore_DP0 pid=221) ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=221) File “/workspace/vllm/vllm/v1/engine/core.py”, line 234, in _initialize_kv_caches
(EngineCore_DP0 pid=221) available_gpu_memory = self.model_executor.determine_available_memory()
(EngineCore_DP0 pid=221) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=221) File “/workspace/vllm/vllm/v1/executor/abstract.py”, line 126, in determine_available_memory
(EngineCore_DP0 pid=221) return self.collective_rpc(“determine_available_memory”)
(EngineCore_DP0 pid=221) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=221) File “/workspace/vllm/vllm/v1/executor/uniproc_executor.py”, line 75, in collective_rpc
(EngineCore_DP0 pid=221) result = run_method(self.driver_worker, method, args, kwargs)
(EngineCore_DP0 pid=221) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=221) File “/workspace/vllm/vllm/v1/serial_utils.py”, line 479, in run_method
(EngineCore_DP0 pid=221) return func(*args, **kwargs)
(EngineCore_DP0 pid=221) ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=221) File “/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py”, line 116, in decorate_context
(EngineCore_DP0 pid=221) return func(*args, **kwargs)
(EngineCore_DP0 pid=221) ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=221) File “/workspace/vllm/vllm/v1/worker/gpu_worker.py”, line 313, in determine_available_memory
(EngineCore_DP0 pid=221) self.model_runner.profile_run()
(EngineCore_DP0 pid=221) File “/workspace/vllm/vllm/v1/worker/gpu_model_runner.py”, line 4145, in profile_run
(EngineCore_DP0 pid=221) hidden_states, last_hidden_states = self._dummy_run(
(EngineCore_DP0 pid=221) ^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=221) File “/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py”, line 116, in decorate_context
(EngineCore_DP0 pid=221) return func(*args, **kwargs)
(EngineCore_DP0 pid=221) ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=221) File “/workspace/vllm/vllm/v1/worker/gpu_model_runner.py”, line 3864, in _dummy_run
(EngineCore_DP0 pid=221) outputs = self.model(
(EngineCore_DP0 pid=221) ^^^^^^^^^^^
(EngineCore_DP0 pid=221) File “/workspace/vllm/vllm/compilation/cuda_graph.py”, line 126, in _call_
(EngineCore_DP0 pid=221) return self.runnable(*args, **kwargs)
(EngineCore_DP0 pid=221) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=221) File “/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py”, line 1751, in _wrapped_call_impl
(EngineCore_DP0 pid=221) return self._call_impl(*args, **kwargs)
(EngineCore_DP0 pid=221) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=221) File “/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py”, line 1762, in _call_impl
(EngineCore_DP0 pid=221) return forward_call(*args, **kwargs)
(EngineCore_DP0 pid=221) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=221) File “/workspace/vllm/vllm/model_executor/models/mllama4.py”, line 901, in forward
(EngineCore_DP0 pid=221) return self.language_model(
(EngineCore_DP0 pid=221) ^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=221) File “/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py”, line 1751, in _wrapped_call_impl
(EngineCore_DP0 pid=221) return self._call_impl(*args, **kwargs)
(EngineCore_DP0 pid=221) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=221) File “/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py”, line 1762, in _call_impl
(EngineCore_DP0 pid=221) return forward_call(*args, **kwargs)
(EngineCore_DP0 pid=221) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=221) File “/workspace/vllm/vllm/model_executor/models/llama.py”, line 617, in forward
(EngineCore_DP0 pid=221) model_output = self.model(
(EngineCore_DP0 pid=221) ^^^^^^^^^^^
(EngineCore_DP0 pid=221) File “/workspace/vllm/vllm/compilation/decorators.py”, line 471, in _call_
(EngineCore_DP0 pid=221) output = TorchCompileWithNoGuardsWrapper._call_(self, *args, **kwargs)
(EngineCore_DP0 pid=221) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=221) File “/workspace/vllm/vllm/compilation/wrapper.py”, line 149, in _call_
(EngineCore_DP0 pid=221) return self._compiled_callable(*args, **kwargs)
(EngineCore_DP0 pid=221) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=221) File “/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py”, line 671, in _fn
(EngineCore_DP0 pid=221) raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1
(EngineCore_DP0 pid=221) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=221) File “/usr/local/lib/python3.12/dist-packages/torch/_dynamo/output_graph.py”, line 1569, in _call_user_compiler
(EngineCore_DP0 pid=221) raise BackendCompilerFailed(
(EngineCore_DP0 pid=221) File “/usr/local/lib/python3.12/dist-packages/torch/_dynamo/output_graph.py”, line 1544, in _call_user_compiler
(EngineCore_DP0 pid=221) compiled_fn = compiler_fn(gm, self.example_inputs())
(EngineCore_DP0 pid=221) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=221) File “/usr/local/lib/python3.12/dist-packages/torch/_dynamo/repro/after_dynamo.py”, line 150, in _call_
(EngineCore_DP0 pid=221) compiled_gm = compiler_fn(gm, example_inputs)
(EngineCore_DP0 pid=221) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=221) File “/usr/local/lib/python3.12/dist-packages/torch/_dynamo/repro/after_dynamo.py”, line 150, in _call_
(EngineCore_DP0 pid=221) compiled_gm = compiler_fn(gm, example_inputs)
(EngineCore_DP0 pid=221) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=221) File “/usr/local/lib/python3.12/dist-packages/torch/_init_.py”, line 2400, in _call_
(EngineCore_DP0 pid=221) return self.compiler_fn(model_, inputs_, **self.kwargs)
(EngineCore_DP0 pid=221) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=221) torch._dynamo.exc.BackendCompilerFailed: backend=‘<vllm.compilation.backends.VllmBackend object at 0x7d4e1c77fec0>’ raised:
(EngineCore_DP0 pid=221) TypeError: VllmBackend._call_() got an unexpected keyword argument ‘options’
(EngineCore_DP0 pid=221)
(EngineCore_DP0 pid=221) Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you’re reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS=“+dynamo”
(EngineCore_DP0 pid=221)
[rank0]:[W1125 04:36:24.755274875 ProcessGroupNCCL.cpp:1505] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see Distributed communication package - torch.distributed — PyTorch 2.9 documentation (function operator())
(APIServer pid=1) Traceback (most recent call last):
(APIServer pid=1) File “”, line 198, in _run_module_as_main
(APIServer pid=1) File “”, line 88, in _run_code
(APIServer pid=1) File “/workspace/vllm/vllm/entrypoints/openai/api_server.py”, line 2178, in
(APIServer pid=1) uvloop.run(run_server(args))
(APIServer pid=1) File “/usr/local/lib/python3.12/dist-packages/uvloop/_init_.py”, line 96, in run
(APIServer pid=1) return __asyncio.run(
(APIServer pid=1) ^^^^^^^^^^^^^^
(APIServer pid=1) File “/usr/lib/python3.12/asyncio/runners.py”, line 194, in run
(APIServer pid=1) return runner.run(main)
(APIServer pid=1) ^^^^^^^^^^^^^^^^
(APIServer pid=1) File “/usr/lib/python3.12/asyncio/runners.py”, line 118, in run
(APIServer pid=1) return self._loop.run_until_complete(task)
(APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) File “uvloop/loop.pyx”, line 1518, in uvloop.loop.Loop.run_until_complete
(APIServer pid=1) File “/usr/local/lib/python3.12/dist-packages/uvloop/_init_.py”, line 48, in wrapper
(APIServer pid=1) return await main
(APIServer pid=1) ^^^^^^^^^^
(APIServer pid=1) File “/workspace/vllm/vllm/entrypoints/openai/api_server.py”, line 2106, in run_server
(APIServer pid=1) await run_server_worker(listen_address, sock, args, **uvicorn_kwargs)
(APIServer pid=1) File “/workspace/vllm/vllm/entrypoints/openai/api_server.py”, line 2125, in run_server_worker
(APIServer pid=1) async with build_async_engine_client(
(APIServer pid=1) File “/usr/lib/python3.12/contextlib.py”, line 210, in _aenter_
(APIServer pid=1) return await anext(self.gen)
(APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) File “/workspace/vllm/vllm/entrypoints/openai/api_server.py”, line 196, in build_async_engine_client
(APIServer pid=1) async with build_async_engine_client_from_engine_args(
(APIServer pid=1) File “/usr/lib/python3.12/contextlib.py”, line 210, in _aenter_
(APIServer pid=1) return await anext(self.gen)
(APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) File “/workspace/vllm/vllm/entrypoints/openai/api_server.py”, line 237, in build_async_engine_client_from_engine_args
(APIServer pid=1) async_llm = AsyncLLM.from_vllm_config(
(APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) File “/workspace/vllm/vllm/utils/func_utils.py”, line 116, in inner
(APIServer pid=1) return fn(*args, **kwargs)
(APIServer pid=1) ^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) File “/workspace/vllm/vllm/v1/engine/async_llm.py”, line 219, in from_vllm_config
(APIServer pid=1) return cls(
(APIServer pid=1) ^^^^
(APIServer pid=1) File “/workspace/vllm/vllm/v1/engine/async_llm.py”, line 133, in _init_
(APIServer pid=1) self.engine_core = EngineCoreClient.make_async_mp_client(
(APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) File “/workspace/vllm/vllm/v1/engine/core_client.py”, line 121, in make_async_mp_client
(APIServer pid=1) return AsyncMPClient(*client_args)
(APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) File “/workspace/vllm/vllm/v1/engine/core_client.py”, line 808, in _init_
(APIServer pid=1) super()._init_(
(APIServer pid=1) File “/workspace/vllm/vllm/v1/engine/core_client.py”, line 469, in _init_
(APIServer pid=1) with launch_core_engines(vllm_config, executor_class, log_stats) as (
(APIServer pid=1) File “/usr/lib/python3.12/contextlib.py”, line 144, in _exit_
(APIServer pid=1) next(self.gen)
(APIServer pid=1) File “/workspace/vllm/vllm/v1/engine/utils.py”, line 907, in launch_core_engines
(APIServer pid=1) wait_for_engine_startup(
(APIServer pid=1) File “/workspace/vllm/vllm/v1/engine/utils.py”, line 964, in wait_for_engine_startup
(APIServer pid=1) raise RuntimeError(
(APIServer pid=1) RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}

RunLLM · November 25, 2025, 5:12am

Your error is caused by a TypeError: VllmBackend.__call__() got an unexpected keyword argument 'options'. This indicates a compatibility issue between your vLLM version and the installed PyTorch version—recent PyTorch releases (2.3+) pass an options argument to backend compilers, but older vLLM versions do not expect it, resulting in this crash. This is a known problem and is not specific to your model or Docker setup, but rather to the vLLM/PyTorch interface itself (see related issues).

To resolve this, upgrade to the latest vLLM nightly build, which includes a fix for this interface change. Alternatively, you can downgrade PyTorch to a version before 2.3, but upgrading vLLM is the recommended solution (see discussion). No other command-line flags or settings will fix this error.

Would you like more detail on how to upgrade vLLM in your Docker environment?

Sources:

vLLM Issue #29265

Hint: Mention @RunLLM in the post for followups.

Topic		Replies	Views
Can anyone help me? Why is this not working? It used 😭 NVIDIA GPU Support	1	1040	May 8, 2025
RTX PRO6000运行Qwen3-32B-FP8量化版本出错 General	1	414	July 4, 2025
求救各位大佬看看是什么问题。cuda12.9，pytorch2.8，vllm0.11.0 General	3	161	November 14, 2025
Added second 5090 and turne on tensor parallel 2 General	9	324	September 18, 2025
Why is this not working? I corrected it but still NVIDIA GPU Support	1	812	May 8, 2025

RTX PRO 6000 users seek help, LLAMA 4 NVFP4

Related topics