cudaMallocHost issues

xhejtman · August 10, 2025, 10:36pm

While testing the sleep function in vLLM, I encountered a failure where torch.empty() was unable to allocate a CPU memory buffer for offloading a model from GPU.
The dmesg output reported errors such as:

Cannot map memory with base addr 0x77bff0000000 and size of 0x8000 pages

A small C reproducer using cudaMallocHost() showed that CUDA could not allocate more than a 2 GB memory block in a single call. Investigation revealed the root cause to be a bug in the Linux kernel 6.11 series ([PATCH] mm/gup: restore the ability to pin more than 2GB at a time - John Hubbard).

Ubuntu 24.04’s HWE kernel line previously used 6.11 (affected by this bug), but it has now moved to 6.14, which does not contain the issue. The mainline kernel for Ubuntu 24.04 (6.8) is also unaffected.

RunLLM · August 10, 2025, 10:36pm

Your findings are correct: the inability to allocate more than 2GB of pinned memory (e.g., via torch.empty(..., pin_memory=True) or cudaMallocHost()) was due to a bug in the Linux kernel 6.11 series, as described in the patch “[PATCH] mm/gup: restore the ability to pin more than 2GB at a time - John Hubbard”. This bug affected memory offloading in vLLM and similar CUDA applications. Ubuntu 24.04’s HWE kernel line previously used 6.11 (affected), but has since moved to 6.14, which resolves the issue. The mainline Ubuntu 24.04 kernel (6.8) is also unaffected by this bug.

For vLLM users encountering this problem, upgrading to a kernel version newer than 6.11 (such as 6.14 or 6.8) will resolve the pinned memory allocation issue. This is consistent with known limitations and workarounds for pinned memory in Linux and CUDA environments, as discussed in the vLLM community and official CUDA documentation (see vLLM issue #188).

Would you like more detail on how this impacts vLLM or how to check your kernel version?

Sources:

vLLM issue #188

Topic		Replies	Views
OutOfMemoryError vLLM cant see the max memory available General	1	86	April 21, 2026
RTX Pro 6000 Tensor Parallelism CUBLAS_STATUS_ALLOC_FAILED NVIDIA GPU Support	3	430	September 13, 2025
Cuda failure 'out of memory' General	1	1623	April 24, 2025
Why is this not working? I corrected it but still NVIDIA GPU Support	1	902	May 8, 2025
Sampler.hip:564:63: error: local memory (66032) exceeds limit (65536) in 'void vllm::topKPerRowDecode<1024, true, false, true>' General	1	37	January 5, 2026

cudaMallocHost issues

Related topics