On HX 370 ryzen iGPU

AnttiK · March 14, 2026, 9:44am

Memory access fault by GPU node-1 (Agent handle: 0x2d959d70) on address 0x75c49ec51000. Reason: Page not present or supervisor privilege.
(APIServer pid=1) Traceback (most recent call last):
(APIServer pid=1) File “/usr/local/bin/vllm”, line 10, in
(APIServer pid=1) sys.exit(main())
(APIServer pid=1) ^^^^^^
(APIServer pid=1) File “/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/main.py”, line 73, in main
(APIServer pid=1) args.dispatch_function(args)
(APIServer pid=1) File “/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/serve.py”, line 60, in cmd
(APIServer pid=1) uvloop.run(run_server(args))
(APIServer pid=1) File “/usr/local/lib/python3.12/dist-packages/uvloop/init.py”, line 96, in run
(APIServer pid=1) return __asyncio.run(
(APIServer pid=1) ^^^^^^^^^^^^^^
(APIServer pid=1) File “/usr/lib/python3.12/asyncio/runners.py”, line 195, in run
(APIServer pid=1) return runner.run(main)
(APIServer pid=1) ^^^^^^^^^^^^^^^^
(APIServer pid=1) File “/usr/lib/python3.12/asyncio/runners.py”, line 118, in run
(APIServer pid=1) return self._loop.run_until_complete(task)
(APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) File “uvloop/loop.pyx”, line 1518, in uvloop.loop.Loop.run_until_complete
(APIServer pid=1) File “/usr/local/lib/python3.12/dist-packages/uvloop/init.py”, line 48, in wrapper
(APIServer pid=1) return await main
(APIServer pid=1) ^^^^^^^^^^
(APIServer pid=1) File “/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py”, line 1385, in run_server
(APIServer pid=1) await run_server_worker(listen_address, sock, args, **uvicorn_kwargs)
(APIServer pid=1) File “/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py”, line 1404, in run_server_worker
(APIServer pid=1) async with build_async_engine_client(
(APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) File “/usr/lib/python3.12/contextlib.py”, line 210, in aenter
(APIServer pid=1) return await anext(self.gen)
(APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) File “/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py”, line 172, in build_async_engine_client
(APIServer pid=1) async with build_async_engine_client_from_engine_args(
(APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) File “/usr/lib/python3.12/contextlib.py”, line 210, in aenter
(APIServer pid=1) return await anext(self.gen)
(APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) File “/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py”, line 213, in build_async_engine_client_from_engine_args
(APIServer pid=1) async_llm = AsyncLLM.from_vllm_config(
(APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) File “/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py”, line 223, in from_vllm_config
(APIServer pid=1) return cls(
(APIServer pid=1) ^^^^
(APIServer pid=1) File “/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py”, line 134, in init
(APIServer pid=1) self.engine_core = EngineCoreClient.make_async_mp_client(
(APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) File “/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py”, line 121, in make_async_mp_client
(APIServer pid=1) return AsyncMPClient(*client_args)
(APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) File “/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py”, line 810, in init
(APIServer pid=1) super().init(
(APIServer pid=1) File “/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py”, line 471, in init
(APIServer pid=1) with launch_core_engines(vllm_config, executor_class, log_stats) as (
(APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) File “/usr/lib/python3.12/contextlib.py”, line 144, in exit
(APIServer pid=1) next(self.gen)
(APIServer pid=1) File “/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/utils.py”, line 903, in launch_core_engines
(APIServer pid=1) wait_for_engine_startup(
(APIServer pid=1) File “/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/utils.py”, line 960, in wait_for_engine_startup
(APIServer pid=1) raise RuntimeError(
(APIServer pid=1) RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}

RunLLM · March 14, 2026, 9:44am

This error indicates a GPU memory access fault, often caused by a hardware issue, driver incompatibility, or a bug in the CUDA stack or vLLM. The traceback shows vLLM’s engine core failed to initialize, which is a generic error that can result from such faults. The official troubleshooting guides recommend checking for hardware/driver issues, ensuring your CUDA and driver versions are compatible, and reviewing logs for more details. If the problem persists, try updating your drivers, CUDA, and vLLM, and ensure your hardware is functioning correctly. See vLLM Troubleshooting and Python multiprocessing for more.

Would you like more detail on how to debug or isolate the root cause?

Sources:

Hint: Mention @RunLLM in the post for followups.

AnttiK · March 15, 2026, 12:25am

why this happens:

docker run -it --dns=192.168.49.1 --network=host --group-add=video --group-add render --ipc=host --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --device /dev/kfd --device /dev/dri --shm-size=2g -e HUGGING_FACE_HUB_TOKEN=“” -e VLLM_SLEEP_WHEN_IDLE=1 -e ROCM_VISIBLE_DEVICES=0 -e HIP_VISIBLE_DEVICES=0 -e HSA_OVERRIDE_GFX_VERSION=11.0.0 -e PYTORCH_ROCM_ARCH=“gfx1150” -e VLLM_USE_TRITON_FLASH_ATTN=0 -e GPU_MAX_HW_QUEUES=1 -e NCCL_DEBUG=WARN -e NCCL_IB_DISABLE=1 --restart=always --name vllm_rocm_gemma3_4b-it-nightly -v /home/antti/vllm_models:/root/.cache/huggingface rocm/vllm-dev:nightly vllm serve google/gemma-3-4b-it --host 0.0.0.0 --port 8000 --enforce-eager --served-model-name gemma3 --trust-remote-code --dtype bfloat16 --kv-cache-dtype auto --max-model-len 1024 --max-num-seqs 1 --max-num-batched-tokens 256 --gpu-memory-utilization 0.40 --max-log-len 100
WARNING 03-15 00:20:43 [gpt_oss_triton_kernels_moe.py:56] Using legacy triton_kernels on ROCm
(APIServer pid=1) INFO 03-15 00:20:46 [utils.py:297]
(APIServer pid=1) INFO 03-15 00:20:46 [utils.py:297] █ █ █▄ ▄█
(APIServer pid=1) INFO 03-15 00:20:46 [utils.py:297] ▄▄ ▄█ █ █ █ ▀▄▀ █ version 0.17.1rc1.dev113+g2cdf92228
(APIServer pid=1) INFO 03-15 00:20:46 [utils.py:297] █▄█▀ █ █ █ █ model google/gemma-3-4b-it
(APIServer pid=1) INFO 03-15 00:20:46 [utils.py:297] ▀▀ ▀▀▀▀▀ ▀▀▀▀▀ ▀ ▀
(APIServer pid=1) INFO 03-15 00:20:46 [utils.py:297]

why it says something about gpt_oss ?

@RunLLM

RunLLM · March 15, 2026, 12:26am

The warning about gpt_oss_triton_kernels appears because vLLM’s ROCm backend shares some kernel code (like MoE and quantization) between different models, including GPT-OSS and Gemma. This message does not mean you are running a GPT-OSS model; it just indicates that the legacy Triton kernels (originally developed for GPT-OSS) are being used for certain backend operations on ROCm, even when serving Gemma-3-4b-it. This is expected and not an error—just a shared code path for efficiency and compatibility (source).

Would you like more detail on how vLLM shares backend kernels across models?

Sources:

ROCm: Use old triton_kernels implementation on ROCm if the new API is not available

Hint: Mention @RunLLM in the post for followups.

AnttiK · March 15, 2026, 12:27am

would like to know why this error happens on hx370

(EngineCore pid=164) INFO 03-15 00:21:25 [parallel_state.py:1395] world_size=1 rank=0 local_rank=0 distributed_init_method=tcp://192.168.50.145:40069 backend=nccl
(EngineCore pid=164) INFO 03-15 00:21:25 [parallel_state.py:1717] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, PCP rank 0, TP rank 0, EP rank N/A, EPLB rank N/A
Memory access fault by GPU node-1 (Agent handle: 0x3d69a810) on address 0x7bd7a2601000. Reason: Page not present or supervisor privilege.
(APIServer pid=1) Traceback (most recent call last):
(APIServer pid=1) File “/usr/local/bin/vllm”, line 10, in
(APIServer pid=1) sys.exit(main())

@RunLLM

RunLLM · March 15, 2026, 12:28am

This “Memory access fault by GPU node-1 … Reason: Page not present or supervisor privilege” error on HX370 indicates a GPU tried to access memory it shouldn’t, often due to a driver, ROCm, or hardware issue, or a bug in the model/kernel code. It is not specific to vLLM, but rather a low-level GPU/ROCm fault (troubleshooting reference). Common causes include incompatible ROCm version, insufficient permissions, or a kernel bug.

Would you like more detail on how to debug or resolve this on ROCm/HX370?

Sources:

Hint: Mention @RunLLM in the post for followups.

AnttiK · March 15, 2026, 12:29am

yes help me debug this bug with hx 370

RunLLM · March 15, 2026, 12:29am

The “Memory access fault by GPU node-1 … Reason: Page not present or supervisor privilege” error on HX370 is a GPU memory access violation, typically caused by ROCm driver issues, hardware incompatibility, or a bug in the model or kernel code. This is a low-level GPU/ROCm error, not specific to vLLM, and is often seen when the ROCm version, PyTorch, or kernel modules are not fully compatible with your hardware (troubleshooting guide).

To debug: check that your ROCm version matches your HX370 hardware and PyTorch build, ensure all drivers are up to date, and verify you have the correct permissions. Try running a simple PyTorch ROCm test to confirm basic GPU functionality. If the error persists, try a different ROCm version or update vLLM to the latest version. Would you like step-by-step instructions for these checks?

Sources:

vLLM Troubleshooting

Hint: Mention @RunLLM in the post for followups.

Topic		Replies	Views
Updated rocm/vllm latest and not anymore working General	1	308	November 4, 2025
No HIP GPUs are available for VeRL verl	4	715	April 4, 2025
Persistent segfaults/SIGSEGV General	1	105	January 20, 2026
Why latest rocm vllm is so bad? General	3	256	December 14, 2025
Issue serving gemma3-27b-it Model Support	1	511	September 19, 2025

On HX 370 ryzen iGPU

Related topics