What situations could cause the 'Memory usage increased after sleeping' problem?

shenofusc · October 25, 2025, 3:22am

Hi, I’m using verl framework to train Qwen3-30B-A3B, and my train task has trained for more than 200 steps, but at 223 step, we occured the ‘Memory usage increased after sleeping’ problem, It’s a little bit weird, I don’t know how to fix it, Is there anybody can give some clues to help me fix this problem? THANKS!

My enviroment is as follows:
hardware: Huawei Ascend NPU 910B, 8 nodes 8 NPUs, total 64 NPUs
CANN: 8.2.RC1
Python: 3.10.18
vllm: 0.9.1
vllm-ascend: 0.9.1rc3
torch: 2.5.1
torch-npu: 2.5.1.post1

Here is the error logs:
File “/cache/verl_algo/verl/single_controller/ray/base.py”, line 766, in func
return getattr(self.worker_dict[key], name)(*args, **kwargs)
File “/cache/verl_algo/verl/single_controller/base/decorator.py”, line 430, in inner
return func(*args, **kwargs)
File “/cache/verl_algo/verl/utils/profiler/mstx_profile.py”, line 210, in wrapper
return func(self, *args, **kwargs)
File “/cache/verl_algo/verl/workers/fsdp_workers.py”, line 762, in generate_sequences
with self.rollout_sharding_manager:
File “/cache/verl_algo/verl/utils/profiler/performance.py”, line 105, in f
return self.log(decorated_function, *args, **kwargs)
File “/cache/verl_algo/verl/utils/profiler/performance.py”, line 118, in log
output = func(*args, **kwargs)
File “/cache/verl_algo/verl/workers/sharding_manager/fsdp_vllm.py”, line 240, in exit
self.inference_engine.sleep(level=1)
File “/cache/verl_env/lib/python3.10/site-packages/vllm/entrypoints/llm.py”, line 1322, in sleep
self.llm_engine.sleep(level=level)
File “/cache/verl_env/lib/python3.10/site-packages/vllm/engine/llm_engine.py”, line 1860, in sleep
self.model_executor.sleep(level=level)
File “/cache/verl_env/lib/python3.10/site-packages/vllm/executor/executor_base.py”, line 207, in sleep
self.collective_rpc(“sleep”, kwargs=dict(level=level))
File “/cache/verl_env/lib/python3.10/site-packages/vllm/executor/uniproc_executor.py”, line 57, in collective_rpc
answer = run_method(self.driver_worker, method, args, kwargs)
File “/cache/verl_env/lib/python3.10/site-packages/vllm/utils.py”, line 2671, in run_method
return func(*args, **kwargs)
File “/cache/verl_env/lib/python3.10/site-packages/vllm_ascend/worker/worker.py”, line 202, in sleep
assert freed_bytes >= 0, “Memory usage increased after sleeping.”
AssertionError: Memory usage increased after sleeping.

RunLLM · October 25, 2025, 3:23am

The error “Memory usage increased after sleeping” is triggered by an assertion in vLLM’s sleep mode, which expects GPU memory usage to decrease after calling the sleep function. This assertion can fail due to device-specific memory management quirks, especially on non-NVIDIA hardware or with custom backends like Ascend NPU. The vLLM code checks memory before and after sleep and asserts that freed_bytes is non-negative; if not, it raises this error. This is a known issue on some hardware, and similar problems have been reported and discussed in the vLLM community, particularly for H100 and other non-standard environments. A common workaround is to catch this assertion and allow the process to continue, as the memory may still be released at the driver level even if PyTorch or the runtime does not immediately reflect it. See the discussion and workaround in vllm-project/vllm#9232 and vllm-project/vllm/pull/11743.

For Ascend NPU, this assertion may not be reliable due to differences in how memory is managed and reported. You can try modifying the sleep function to skip or log (rather than assert) when freed_bytes < 0, as suggested in the linked issues. This approach has been used successfully on other hardware where the assertion is too strict. However, be aware that this is a workaround and may mask underlying memory management issues. Always monitor actual device memory usage with external tools to ensure stability.

Would you like a step-by-step guide on how to patch the code or further details on why this happens?

Sources:

Hint: Mention @RunLLM in the post for followups.

Topic		Replies	Views
Does vllm support deploy multiple docker instance on one gpu General	1	924	May 21, 2025
Help with vLLM crashes General	1	772	December 16, 2025
V1 GPU not free gpu memory General	2	126	November 5, 2025
An issue about using multiple GPU to deploy multiple models with vllm General	1	59	May 28, 2026
VLLM 0.10.1 benchmark do not free memory Benchmarking	13	219	November 10, 2025

What situations could cause the 'Memory usage increased after sleeping' problem?

Related topics