Sampler.hip:564:63: error: local memory (66032) exceeds limit (65536) in 'void vllm::topKPerRowDecode<1024, true, false, true>'

t348575 · January 5, 2026, 1:22pm

I am trying to build vllm on wsl2 with rocm as specified here: GPU - vLLM, however my build stops with the following error:

/home/joefe/repos/vllm/build/temp.linux-x86_64-cpython-312/csrc/sampler.hip:564:63: error: local memory (66032) exceeds limit (65536) in ‘void vllm::topKPerRowDecode<1024, true, false, true>(float const\*, int const\*, int\*, int, int, int, int, float\*, int, int const\*)’
564 | static **global** **launch_bounds**(kNumThreadsPerBlock) void topKPerRowDecode(
|                                                               ^
16 warnings and 1 error generated when compiling for gfx1201.

Rocm (6.4) and pytorch (2.9) appear to be working fine.
I am on WSL2 with a radeon RX9070 (16gb).

I have also tried nightly rocm 7 & nightly pytorch 2.11.

collect_env: collect_env - Pastebin.com

Is this a bug or some kind of enviroment setup issue? Thanks.

RunLLM · January 5, 2026, 1:23pm

It seems likely that this is a build-time resource limit issue, not a pure environment misconfiguration. The error indicates that the kernel topKPerRowDecode is using more local memory (66032 bytes) than allowed (65536 bytes) for your GPU architecture (gfx1201), which is enforced by the ROCm/HIP compiler. This is a known limitation when compiling large kernels for certain AMD GPUs, especially with high thread/block counts or large shared memory allocations, and is not unique to your WSL2 setup or ROCm version. There is no explicit workaround or fix for this in the official vLLM documentation, and the error is not directly addressed in the ROCm or vLLM installation guides for AMD GPUs or WSL2 environments, though similar build issues are discussed in the context of resource limits and kernel compilation failures (docs.vllm.ai ROCm install, github_issue #10855).

You may need to patch the kernel code to reduce shared/local memory usage, lower the number of threads per block, or wait for an upstream fix. Alternatively, you could try building on native Linux or with a different ROCm version, but there is no guarantee this will resolve the kernel resource limit. Would you like more detail on how to attempt a workaround or patch the kernel code?

Sources:

Hint: Mention @RunLLM in the post for followups.

Topic		Replies	Views
OutOfMemoryError vLLM cant see the max memory available General	1	53	April 21, 2026
Help with vLLM crashes General	1	603	December 16, 2025
Updated rocm/vllm latest and not anymore working General	1	332	November 4, 2025
Error deploy model with ray General	5	645	August 17, 2025
How to setup amd gpu as default in dual stack gpu? AMD GPU Support	10	698	April 21, 2025

Sampler.hip:564:63: error: local memory (66032) exceeds limit (65536) in 'void vllm::topKPerRowDecode<1024, true, false, true>'

Related topics