Memory access fault by GPU node-1 (Agent handle: 0x2d959d70) on address 0x75c49ec51000. Reason: Page not present or supervisor privilege.
(APIServer pid=1) Traceback (most recent call last):
(APIServer pid=1) File “/usr/local/bin/vllm”, line 10, in
(APIServer pid=1) sys.exit(main())
(APIServer pid=1) ^^^^^^
(APIServer pid=1) File “/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/main.py”, line 73, in main
(APIServer pid=1) args.dispatch_function(args)
(APIServer pid=1) File “/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/serve.py”, line 60, in cmd
(APIServer pid=1) uvloop.run(run_server(args))
(APIServer pid=1) File “/usr/local/lib/python3.12/dist-packages/uvloop/init.py”, line 96, in run
(APIServer pid=1) return __asyncio.run(
(APIServer pid=1) ^^^^^^^^^^^^^^
(APIServer pid=1) File “/usr/lib/python3.12/asyncio/runners.py”, line 195, in run
(APIServer pid=1) return runner.run(main)
(APIServer pid=1) ^^^^^^^^^^^^^^^^
(APIServer pid=1) File “/usr/lib/python3.12/asyncio/runners.py”, line 118, in run
(APIServer pid=1) return self._loop.run_until_complete(task)
(APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) File “uvloop/loop.pyx”, line 1518, in uvloop.loop.Loop.run_until_complete
(APIServer pid=1) File “/usr/local/lib/python3.12/dist-packages/uvloop/init.py”, line 48, in wrapper
(APIServer pid=1) return await main
(APIServer pid=1) ^^^^^^^^^^
(APIServer pid=1) File “/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py”, line 1385, in run_server
(APIServer pid=1) await run_server_worker(listen_address, sock, args, **uvicorn_kwargs)
(APIServer pid=1) File “/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py”, line 1404, in run_server_worker
(APIServer pid=1) async with build_async_engine_client(
(APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) File “/usr/lib/python3.12/contextlib.py”, line 210, in aenter
(APIServer pid=1) return await anext(self.gen)
(APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) File “/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py”, line 172, in build_async_engine_client
(APIServer pid=1) async with build_async_engine_client_from_engine_args(
(APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) File “/usr/lib/python3.12/contextlib.py”, line 210, in aenter
(APIServer pid=1) return await anext(self.gen)
(APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) File “/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py”, line 213, in build_async_engine_client_from_engine_args
(APIServer pid=1) async_llm = AsyncLLM.from_vllm_config(
(APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) File “/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py”, line 223, in from_vllm_config
(APIServer pid=1) return cls(
(APIServer pid=1) ^^^^
(APIServer pid=1) File “/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py”, line 134, in init
(APIServer pid=1) self.engine_core = EngineCoreClient.make_async_mp_client(
(APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) File “/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py”, line 121, in make_async_mp_client
(APIServer pid=1) return AsyncMPClient(*client_args)
(APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) File “/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py”, line 810, in init
(APIServer pid=1) super().init(
(APIServer pid=1) File “/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py”, line 471, in init
(APIServer pid=1) with launch_core_engines(vllm_config, executor_class, log_stats) as (
(APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) File “/usr/lib/python3.12/contextlib.py”, line 144, in exit
(APIServer pid=1) next(self.gen)
(APIServer pid=1) File “/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/utils.py”, line 903, in launch_core_engines
(APIServer pid=1) wait_for_engine_startup(
(APIServer pid=1) File “/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/utils.py”, line 960, in wait_for_engine_startup
(APIServer pid=1) raise RuntimeError(
(APIServer pid=1) RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}
This error indicates a GPU memory access fault, often caused by a hardware issue, driver incompatibility, or a bug in the CUDA stack or vLLM. The traceback shows vLLM’s engine core failed to initialize, which is a generic error that can result from such faults. The official troubleshooting guides recommend checking for hardware/driver issues, ensuring your CUDA and driver versions are compatible, and reviewing logs for more details. If the problem persists, try updating your drivers, CUDA, and vLLM, and ensure your hardware is functioning correctly. See vLLM Troubleshooting and Python multiprocessing for more.
Would you like more detail on how to debug or isolate the root cause?
Sources:
Hint: Mention @RunLLM in the post for followups.
why this happens:
docker run -it --dns=192.168.49.1 --network=host --group-add=video --group-add render --ipc=host --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --device /dev/kfd --device /dev/dri --shm-size=2g -e HUGGING_FACE_HUB_TOKEN=“” -e VLLM_SLEEP_WHEN_IDLE=1 -e ROCM_VISIBLE_DEVICES=0 -e HIP_VISIBLE_DEVICES=0 -e HSA_OVERRIDE_GFX_VERSION=11.0.0 -e PYTORCH_ROCM_ARCH=“gfx1150” -e VLLM_USE_TRITON_FLASH_ATTN=0 -e GPU_MAX_HW_QUEUES=1 -e NCCL_DEBUG=WARN -e NCCL_IB_DISABLE=1 --restart=always --name vllm_rocm_gemma3_4b-it-nightly -v /home/antti/vllm_models:/root/.cache/huggingface rocm/vllm-dev:nightly vllm serve google/gemma-3-4b-it --host 0.0.0.0 --port 8000 --enforce-eager --served-model-name gemma3 --trust-remote-code --dtype bfloat16 --kv-cache-dtype auto --max-model-len 1024 --max-num-seqs 1 --max-num-batched-tokens 256 --gpu-memory-utilization 0.40 --max-log-len 100
WARNING 03-15 00:20:43 [gpt_oss_triton_kernels_moe.py:56] Using legacy triton_kernels on ROCm
(APIServer pid=1) INFO 03-15 00:20:46 [utils.py:297]
(APIServer pid=1) INFO 03-15 00:20:46 [utils.py:297] █ █ █▄ ▄█
(APIServer pid=1) INFO 03-15 00:20:46 [utils.py:297] ▄▄ ▄█ █ █ █ ▀▄▀ █ version 0.17.1rc1.dev113+g2cdf92228
(APIServer pid=1) INFO 03-15 00:20:46 [utils.py:297] █▄█▀ █ █ █ █ model google/gemma-3-4b-it
(APIServer pid=1) INFO 03-15 00:20:46 [utils.py:297] ▀▀ ▀▀▀▀▀ ▀▀▀▀▀ ▀ ▀
(APIServer pid=1) INFO 03-15 00:20:46 [utils.py:297]
why it says something about gpt_oss ?
The warning about gpt_oss_triton_kernels appears because vLLM’s ROCm backend shares some kernel code (like MoE and quantization) between different models, including GPT-OSS and Gemma. This message does not mean you are running a GPT-OSS model; it just indicates that the legacy Triton kernels (originally developed for GPT-OSS) are being used for certain backend operations on ROCm, even when serving Gemma-3-4b-it. This is expected and not an error—just a shared code path for efficiency and compatibility (source).
Would you like more detail on how vLLM shares backend kernels across models?
Sources:
Hint: Mention @RunLLM in the post for followups.
would like to know why this error happens on hx370
(EngineCore pid=164) INFO 03-15 00:21:25 [parallel_state.py:1395] world_size=1 rank=0 local_rank=0 distributed_init_method=tcp://192.168.50.145:40069 backend=nccl
(EngineCore pid=164) INFO 03-15 00:21:25 [parallel_state.py:1717] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, PCP rank 0, TP rank 0, EP rank N/A, EPLB rank N/A
Memory access fault by GPU node-1 (Agent handle: 0x3d69a810) on address 0x7bd7a2601000. Reason: Page not present or supervisor privilege.
(APIServer pid=1) Traceback (most recent call last):
(APIServer pid=1) File “/usr/local/bin/vllm”, line 10, in
(APIServer pid=1) sys.exit(main())
This “Memory access fault by GPU node-1 … Reason: Page not present or supervisor privilege” error on HX370 indicates a GPU tried to access memory it shouldn’t, often due to a driver, ROCm, or hardware issue, or a bug in the model/kernel code. It is not specific to vLLM, but rather a low-level GPU/ROCm fault (troubleshooting reference). Common causes include incompatible ROCm version, insufficient permissions, or a kernel bug.
Would you like more detail on how to debug or resolve this on ROCm/HX370?
Sources:
Hint: Mention @RunLLM in the post for followups.
yes help me debug this bug with hx 370
The “Memory access fault by GPU node-1 … Reason: Page not present or supervisor privilege” error on HX370 is a GPU memory access violation, typically caused by ROCm driver issues, hardware incompatibility, or a bug in the model or kernel code. This is a low-level GPU/ROCm error, not specific to vLLM, and is often seen when the ROCm version, PyTorch, or kernel modules are not fully compatible with your hardware (troubleshooting guide).
To debug: check that your ROCm version matches your HX370 hardware and PyTorch build, ensure all drivers are up to date, and verify you have the correct permissions. Try running a simple PyTorch ROCm test to confirm basic GPU functionality. If the error persists, try a different ROCm version or update vLLM to the latest version. Would you like step-by-step instructions for these checks?
Sources:
Hint: Mention @RunLLM in the post for followups.