We have deployed the model “Llama-4-Scout-17B-16E-Instruct” on a virtual machine with a GPU card. Our environment details are as follows:
-
OS version: Ubuntu 22.04.5
-
GPU model: Nvidia A100
-
vLLM version: 0.10
Recently, we noticed that the model’s service unexpectedly went down. We restarted the model, and it resumed normal operation. However, after running for a period of time, the problem occurred again. We have tried to identify the root cause from the model logs but have not found any clues yet.
We start the model using the following command:
python -m vllm.entrypoints.openai.api_server --served-model-name Llama-4-Scout-17B-16E-Instruct --model /models/Llama-4-Scout-17B-16E-Instruct --tensor-parallel-size 4 --gpu-memory-utilization 0.9 --max-model-len 131072 --limit-mm-per-prompt.image 10 >> /var/log/vllm/vllm-Llama-4-Scout-17B-16E-Instruct.log &
Do you have any ideas on what might be causing this issue? Additionally, I have enclosed the error log for your reference.
==================================================================
ERROR 08-28 10:34:20 [core.py:634] EngineCore encountered a fatal error.
ERROR 08-28 10:34:20 [core.py:634] Traceback (most recent call last):
ERROR 08-28 10:34:20 [core.py:634] File “/home/llmsvc1/venv/lib/python3.10/site-packages/vllm/v1/engine/core.py”, line 625, in run_engine_core
ERROR 08-28 10:34:20 [core.py:634] engine_core.run_busy_loop()
ERROR 08-28 10:34:20 [core.py:634] File “/home/llmsvc1/venv/lib/python3.10/site-packages/vllm/v1/engine/core.py”, line 652, in run_busy_loop
ERROR 08-28 10:34:20 [core.py:634] self._process_engine_step()
ERROR 08-28 10:34:20 [core.py:634] File “/home/llmsvc1/venv/lib/python3.10/site-packages/vllm/v1/engine/core.py”, line 677, in _process_engine_step
ERROR 08-28 10:34:20 [core.py:634] outputs, model_executed = self.step_fn()
ERROR 08-28 10:34:20 [core.py:634] File “/home/llmsvc1/venv/lib/python3.10/site-packages/vllm/v1/engine/core.py”, line 266, in step
ERROR 08-28 10:34:20 [core.py:634] scheduler_output = self.scheduler.schedule()
ERROR 08-28 10:34:20 [core.py:634] File “/home/llmsvc1/venv/lib/python3.10/site-packages/vllm/v1/core/sched/scheduler.py”, line 440, in schedule
ERROR 08-28 10:34:20 [core.py:634] new_blocks = self.kv_cache_manager.allocate_slots(
ERROR 08-28 10:34:20 [core.py:634] File “/home/llmsvc1/venv/lib/python3.10/site-packages/vllm/v1/core/kv_cache_manager.py”, line 302, in allocate_slots
ERROR 08-28 10:34:20 [core.py:634] self.coordinator.cache_blocks(
ERROR 08-28 10:34:20 [core.py:634] File “/home/llmsvc1/venv/lib/python3.10/site-packages/vllm/v1/core/kv_cache_coordinator.py”, line 113, in cache_blocks
ERROR 08-28 10:34:20 [core.py:634] manager.cache_blocks(request, block_hashes, num_computed_tokens)
ERROR 08-28 10:34:20 [core.py:634] File “/home/llmsvc1/venv/lib/python3.10/site-packages/vllm/v1/core/single_type_kv_cache_manager.py”, line 146, in cache_blocks
ERROR 08-28 10:34:20 [core.py:634] self.block_pool.cache_full_blocks(
ERROR 08-28 10:34:20 [core.py:634] File “/home/llmsvc1/venv/lib/python3.10/site-packages/vllm/v1/core/block_pool.py”, line 138, in cache_full_blocks
ERROR 08-28 10:34:20 [core.py:634] assert prev_block.block_hash is not None
ERROR 08-28 10:34:20 [core.py:634] AssertionError
ERROR 08-28 10:34:20 [async_llm.py:416] AsyncLLM output_handler failed.
ERROR 08-28 10:34:20 [async_llm.py:416] Traceback (most recent call last):
ERROR 08-28 10:34:20 [async_llm.py:416] File “/home/llmsvc1/venv/lib/python3.10/site-packages/vllm/v1/engine/async_llm.py”, line 375, in output_handler
ERROR 08-28 10:34:20 [async_llm.py:416] outputs = await engine_core.get_output_async()
ERROR 08-28 10:34:20 [async_llm.py:416] File “/home/llmsvc1/venv/lib/python3.10/site-packages/vllm/v1/engine/core_client.py”, line 751, in get_output_async
ERROR 08-28 10:34:20 [async_llm.py:416] raise self._format_exception(outputs) from None
ERROR 08-28 10:34:20 [async_llm.py:416] vllm.v1.engine.exceptions.EngineDeadError: EngineCore encountered an issue. See stack trace (above) for the root cause.
INFO 08-28 10:34:20 [async_llm.py:342] Request chatcmpl-22e1134981ad4966a27358f0bbec386d failed (engine dead).
ERROR 08-28 10:34:20 [serving_chat.py:932] Error in chat completion stream generator.
ERROR 08-28 10:34:20 [serving_chat.py:932] Traceback (most recent call last):
ERROR 08-28 10:34:20 [serving_chat.py:932] File “/home/llmsvc1/venv/lib/python3.10/site-packages/vllm/entrypoints/openai/serving_chat.py”, line 497, in chat_completion_stream_generator
ERROR 08-28 10:34:20 [serving_chat.py:932] async for res in result_generator:
ERROR 08-28 10:34:20 [serving_chat.py:932] File “/home/llmsvc1/venv/lib/python3.10/site-packages/vllm/v1/engine/async_llm.py”, line 323, in generate
ERROR 08-28 10:34:20 [serving_chat.py:932] out = q.get_nowait() or await q.get()
ERROR 08-28 10:34:20 [serving_chat.py:932] File “/home/llmsvc1/venv/lib/python3.10/site-packages/vllm/v1/engine/output_processor.py”, line 57, in get
ERROR 08-28 10:34:20 [serving_chat.py:932] raise output
ERROR 08-28 10:34:20 [serving_chat.py:932] File “/home/llmsvc1/venv/lib/python3.10/site-packages/vllm/v1/engine/async_llm.py”, line 375, in output_handler
ERROR 08-28 10:34:20 [serving_chat.py:932] outputs = await engine_core.get_output_async()
ERROR 08-28 10:34:20 [serving_chat.py:932] File “/home/llmsvc1/venv/lib/python3.10/site-packages/vllm/v1/engine/core_client.py”, line 751, in get_output_async
ERROR 08-28 10:34:20 [serving_chat.py:932] raise self._format_exception(outputs) from None
ERROR 08-28 10:34:20 [serving_chat.py:932] vllm.v1.engine.exceptions.EngineDeadError: EngineCore encountered an issue. See stack trace (above) for the root cause.
INFO 08-28 10:34:20 [async_llm.py:342] Request chatcmpl-b6381142fd1f43e38e5173bd9b545ac2 failed (engine dead).
ERROR 08-28 10:34:20 [serving_chat.py:932] Error in chat completion stream generator.
ERROR 08-28 10:34:20 [serving_chat.py:932] Traceback (most recent call last):
ERROR 08-28 10:34:20 [serving_chat.py:932] File “/home/llmsvc1/venv/lib/python3.10/site-packages/vllm/entrypoints/openai/serving_chat.py”, line 497, in chat_completion_stream_generator
ERROR 08-28 10:34:20 [serving_chat.py:932] async for res in result_generator:
ERROR 08-28 10:34:20 [serving_chat.py:932] File “/home/llmsvc1/venv/lib/python3.10/site-packages/vllm/v1/engine/async_llm.py”, line 323, in generate
ERROR 08-28 10:34:20 [serving_chat.py:932] out = q.get_nowait() or await q.get()
ERROR 08-28 10:34:20 [serving_chat.py:932] File “/home/llmsvc1/venv/lib/python3.10/site-packages/vllm/v1/engine/output_processor.py”, line 57, in get
ERROR 08-28 10:34:20 [serving_chat.py:932] raise output
ERROR 08-28 10:34:20 [serving_chat.py:932] File “/home/llmsvc1/venv/lib/python3.10/site-packages/vllm/entrypoints/openai/serving_chat.py”, line 497, in chat_completion_stream_generator
ERROR 08-28 10:34:20 [serving_chat.py:932] async for res in result_generator:
ERROR 08-28 10:34:20 [serving_chat.py:932] File “/home/llmsvc1/venv/lib/python3.10/site-packages/vllm/v1/engine/async_llm.py”, line 323, in generate
ERROR 08-28 10:34:20 [serving_chat.py:932] out = q.get_nowait() or await q.get()
ERROR 08-28 10:34:20 [serving_chat.py:932] File “/home/llmsvc1/venv/lib/python3.10/site-packages/vllm/v1/engine/output_processor.py”, line 57, in get
ERROR 08-28 10:34:20 [serving_chat.py:932] raise output
ERROR 08-28 10:34:20 [serving_chat.py:932] File “/home/llmsvc1/venv/lib/python3.10/site-packages/vllm/v1/engine/async_llm.py”, line 375, in output_handler
ERROR 08-28 10:34:20 [serving_chat.py:932] outputs = await engine_core.get_output_async()
ERROR 08-28 10:34:20 [serving_chat.py:932] File “/home/llmsvc1/venv/lib/python3.10/site-packages/vllm/v1/engine/core_client.py”, line 751, in get_output_async
ERROR 08-28 10:34:20 [serving_chat.py:932] raise self._format_exception(outputs) from None
ERROR 08-28 10:34:20 [serving_chat.py:932] vllm.v1.engine.exceptions.EngineDeadError: EngineCore encountered an issue. See stack trace (above) for the root cause.
Thank you.
Regards,
Jimmy