Deallocating None when using engine.sleep?

In my RL program, I frequently call engine.sleep, and I often encounter the error: ‘Fatal Python error: none_dealloc: deallocating None’. Do you have any ideas on how to resolve or avoid this issue?

Here are some related issues:

This error (Fatal Python error: none_dealloc: deallocating None) is likely caused by a bug in vLLM’s sleep mode, specifically related to Python’s garbage collection order and the CUDA memory allocator. When engine.sleep is called frequently, vLLM’s custom memory management interacts poorly with PyTorch’s CUDAPluggableAllocator, leading to segmentation faults or double-free errors during cleanup, especially in distributed or RL settings. This issue is reproducible in pure vLLM environments and has been reported across multiple vLLM versions since 0.8.0, but not in 0.7.3. There is currently no official fix, but a suggested workaround is to avoid using sleep mode or to reimplement it using dispatch mode to better control tensor allocation and deallocation. Setting VLLM_USE_V1=0 can make the error more apparent, but does not resolve it. Attempts to manually clean up with atexit or by skipping garbage collection for certain tensors have not been successful so far. See the detailed discussion and reproduction steps in vLLM issue #16993.

Would you like a step-by-step breakdown of the technical details and possible workarounds?

Sources:

@youkaichao do you know how to solve it? I think it is some bug with sleep and gc collect

According to vLLM issue #16993, the root cause is indeed related to Python’s garbage collection (gc) order and vLLM’s sleep mode, which interacts poorly with PyTorch’s CUDA memory allocator. The maintainer suggests that this is a Python gc issue and that reimplementing sleep mode with dispatch mode (instead of the current memory pool approach) might help, but this is non-trivial. There is no official fix yet, and manual gc.collect() or atexit cleanup does not resolve the error.

Would you like more detail on the technical background or possible temporary workarounds?

Sources:

After removing all the manual gc_collect logic, the occasional none_dealloc error still occurs. @youkaichao could you please take a look and see how to resolve it?

As per vLLM issue #16993, removing manual gc.collect() does not resolve the none_dealloc error. The maintainer (@youkaichao) confirms this is due to Python’s unpredictable garbage collection order, which can cause issues with vLLM’s sleep mode and PyTorch’s CUDA allocator. There is currently no official fix, and the only suggested direction is to reimplement sleep mode using dispatch mode to better control memory, but this is not yet available.

Would you like a more detailed explanation of why this happens or guidance on possible mitigations?

Sources:

should be fixed by [Bugfix] Add strong reference to CUDA pluggable allocator callbacks by 22quinn · Pull Request #23477 · vllm-project/vllm · GitHub