python+vllm：Out of memory

luxferre12345 · May 26, 2026, 7:55am

**背景：**在服务器上部署了基于python flask框架的ocr解析服务，通过http调用vllm推理加速框架，使用gpu进行ocr解析。

项目版本：使用paddlerocr 3.4.0版本，使用vllm推理加速框架PaddleOCR-VL-1.5-0.9B

**问题描述：**这个服务器只用于运行ocr解析任务，但是即使限制并发为1，但仍出现oom的问题，我发现启动推理加速服务后，即使没运行任务GPU约占用5G，启动python服务后，运行了一个解析任务，运行完毕后，通过top命令查看内存，发现python代码和vllm服务的内存并没有回收，而是持续占用，导致服务运行一段时候之后就oom,具体错误信息为【Out of memory: Killed process 36797 (python3) total-vm:45950116kB, anon-rss:9253320kB, file-rss:57544kB, shmem-rss:2304kB, UID:1000 pgtables:46404kB oom_score_adj:0】
服务器配置：

内存使用情况：

有没有办法能解决oom的问题

RunLLM · May 26, 2026, 7:55am

根据你的描述，vLLM和Python进程在OCR推理任务结束后内存未释放，长时间运行后导致OOM。这个问题在多模态模型（如PaddleOCR-VL）和vLLM结合时较为常见，主要原因是vLLM的多模态缓存和内存管理机制未能及时回收内存，尤其是在高频或大批量图片推理场景下。

解决建议如下：

限制多模态缓存：启动vLLM服务时，添加--mm-processor-cache-gb 0参数，或设置环境变量VLLM_MM_INPUT_CACHE_GIB=0，以关闭或限制多模态输入缓存，防止内存持续增长。参考、官方文档
降低GPU显存利用率：适当调低--gpu-memory-utilization参数（如0.8或更低），为临时变量和缓存留出空间，减少OOM风险。参考
限制最大并发和批量：设置--max-num-seqs和--max-num-batched-tokens为较小值，避免单次请求占用过多内存。参考
关闭不必要的多模态通道：如只做图片OCR，--limit-mm-per-prompt.video 0，避免为视频等未用通道分配内存。参考
定期重启服务：如内存泄漏无法完全避免，可定期重启vLLM服务作为临时缓解措施。参考

是否需要详细的参数设置示例或进一步的排查步骤？

Sources:

Hint: Mention @RunLLM in the post for followups.

luxferre12345 · May 26, 2026, 8:05am

nohup paddleocr genai_server --model_name PaddleOCR-VL-1.5-0.9B --backend vllm --port 8118 --host 0.0.0.0 --mm-processor-cache-gb 0 > vlm_server.log 2>&1 &

usage: paddleocr [-h] [-v]
{doc_preprocessor,doc_understanding,formula_recognition_pipeline,ocr,doc_parser,pp_chatocrv4_doc,pp_doctranslation,pp_structurev3,seal_recognition,table_recognition_v2,chart_parsing,doc_img_orientation_classification,doc_vlm,formula_recognition,layout_detection,seal_text_detection,table_cells_detection,table_classification,table_structure_recognition,text_detection,text_image_unwarping,textline_orientation_classification,text_recognition,install_hpi_deps,install_genai_server_deps,genai_server}
…
paddleocr: error: unrecognized arguments: --mm-processor-cache-gb 0

Topic		Replies	Views
Help with vLLM crashes General	1	765	December 16, 2025
torch.OutOfMemoryError: CUDA out of memory General	15	1639	March 3, 2026
Cuda failure 'out of memory' General	1	1672	April 24, 2025
How to understand OOM and foresee memory usage General	5	171	April 24, 2026
VLLM 0.10.1 benchmark do not free memory Benchmarking	13	219	November 10, 2025

python+vllm：Out of memory

Related topics