While testing the sleep function in vLLM, I encountered a failure where torch.empty() was unable to allocate a CPU memory buffer for offloading a model from GPU.
The dmesg output reported errors such as:
Cannot map memory with base addr 0x77bff0000000 and size of 0x8000 pages
Ubuntu 24.04’s HWE kernel line previously used 6.11 (affected by this bug), but it has now moved to 6.14, which does not contain the issue. The mainline kernel for Ubuntu 24.04 (6.8) is also unaffected.
Your findings are correct: the inability to allocate more than 2GB of pinned memory (e.g., via torch.empty(..., pin_memory=True) or cudaMallocHost()) was due to a bug in the Linux kernel 6.11 series, as described in the patch “[PATCH] mm/gup: restore the ability to pin more than 2GB at a time - John Hubbard”. This bug affected memory offloading in vLLM and similar CUDA applications. Ubuntu 24.04’s HWE kernel line previously used 6.11 (affected), but has since moved to 6.14, which resolves the issue. The mainline Ubuntu 24.04 kernel (6.8) is also unaffected by this bug.
For vLLM users encountering this problem, upgrading to a kernel version newer than 6.11 (such as 6.14 or 6.8) will resolve the pinned memory allocation issue. This is consistent with known limitations and workarounds for pinned memory in Linux and CUDA environments, as discussed in the vLLM community and official CUDA documentation (see vLLM issue #188).
Would you like more detail on how this impacts vLLM or how to check your kernel version?