vLLM 0.20.1, Radeon AI 9700, 1 CPU core at 100%

Hello there,

I’m serving a FP8-quant of gemma-4 E4B on a Radeon AI Pro 9700 with vLLM and I find that things perform and scale wonderfully, so thanks a lot to the vLLM team!

However, even when no requests are processed, one CPU core is at 100% busy all the time (kept busy by a VLLM:EngineCore thread, apparently).

The setup of my test environment:

  • Ryzen 1600AF (basically a 2600), 6 cores/12 threads with 16 GB of RAM
  • Linux Mint 22.4 (Ubuntu 24.04 based)
  • one Radeon AI Pro 9700
  • ROCm 7.2.2

Installed vLLM 0.20.1 in a Python venv:

pip install vllm==0.20.1+rocm721 --extra-index-url https://wheels.vllm.ai/rocm/0.20.1/rocm721

Now, there are old reports that on the surface read related (issue #1698), however, this should have been solved first by pull #16226 which introduced VLLM_SLEEP_WHEN_IDLE=1, and then by another overhaul via pull #28053 which superseded that earlier patch as far as I can tell.

I wonder if this is a Radeon AI Pro 9700 specific issue, there seems to be other problems regarding vLLM and ROCm regarding that GPU, e.g., issues #40980 (albeit that issue is the GPU being at 100%, not the CPU, so I assume it’s unrelated).

(as a new user I can’t post more than two links, I’m sorry the issues and pull requests are not proper links)

Any ideas on how to resolve this? Thanks!

The persistent 100% CPU usage by a vLLM:EngineCore thread when idle was previously addressed by introducing the VLLM_SLEEP_WHEN_IDLE=1 environment variable (see issue #16968 and PR #16226). However, this workaround was later superseded by a more comprehensive fix in PR #28053, which should be included in vLLM 0.20.1 and newer, making the environment variable unnecessary. If you are still seeing high CPU usage on vLLM 0.20.1+rocm721, this may indicate a regression or a hardware/ROCm-specific issue with the Radeon AI Pro 9700.

There are no reports in the provided data sources of this issue persisting on supported NVIDIA or AMD platforms after the aforementioned fixes. For further troubleshooting, ensure you are not running with legacy environment variables, and consider opening a new issue with detailed logs and system information, as recommended in the troubleshooting guide. Would you like a step-by-step breakdown of how to collect and submit diagnostic information for the vLLM team?

Sources:

Hint: Mention @RunLLM in the post for followups.

Hi @RunLLM ,

it appears your response matches with my research. If you can provide a guide on how to determine what the busy CPU core does and provide proper debugging information to the vLLM team, that’d be appreciated. Thanks!

To determine what the busy CPU core is doing and provide actionable debugging info to the vLLM team, you should (1) identify the process/thread responsible, (2) collect stack traces, and (3) gather environment and log details. According to the troubleshooting guide, you can use tools like htop or top to find the PID, then use strace -p <PID> or gdb -p <PID> to capture what the thread is executing. For Python-level insight, py-spy top -p <PID> or py-spy dump -p <PID> can show which Python functions are consuming CPU.

Additionally, collect the output of python collect_env.py (from the vLLM repo), your vLLM command line, and any relevant logs (especially with export VLLM_LOGGING_LEVEL=DEBUG). Attach these, along with stack traces, to a new GitHub issue as recommended in the troubleshooting docs. Would you like a step-by-step example of these commands?

Sources:

Hint: Mention @RunLLM in the post for followups.

@RunLLM

When looking into the process that keeps the CPU core busy 100%, the output consists of endless repeats of

ioctl(12, AMDKFD_IOC_WAIT_EVENTS, 0x7527a97fed00) = 0

This looks like this system call is immediately returning, instead of waiting for an appreciable amount of time, which perhaps introduces busy-wait-alike behavior by accident.

(edit: utilizing strace -p <PID>)

Your analysis is correct: if ioctl(AMDKFD_IOC_WAIT_EVENTS) returns immediately in a tight loop, it causes busy-waiting and high CPU usage. This suggests the ROCm driver or vLLM’s interaction with it is not blocking as intended, leading to the observed 100% CPU utilization. This behavior is not expected after the vLLM fixes for idle CPU usage, and appears to be specific to the AMD ROCm stack or the Radeon AI Pro 9700.

This detail—especially the exact syscall pattern—should be included in a new GitHub issue for the vLLM team, as recommended in the troubleshooting guide. There is no explicit mention of this ioctl behavior in the vLLM data sources, so it may be a new or hardware-specific bug.

Sources:

Hint: Mention @RunLLM in the post for followups.