How to run DeepSeek OCR 2 on vllm 0.23.0?

dingfen · June 23, 2026, 10:40am

I just tried to run DeepSeek OCR 2 on vllm 0.23.0 on H20, and here is my running command:

vllm serve DeepSeek-OCR-2 --host 0.0.0.0 --port 8999  --tensor-parallel-size 2 --pipeline-parallel-size 1 --trust-remote-code --served-model-name DeepSeek-OCR-2 --enable-prefix-caching &

But I got:

Profiling CUDA graph memory PIECEWISE=51 (largest=512), FULL=51 (largest=512)

Worker proc VllmWorker-1 died unexpectedly, shutting down executor

Executor: waiting for worker exit count=2
Executor: all workers exited gracefully

RunLLM · June 23, 2026, 10:40am

DeepSeek-OCR-2 is known to crash on H20 GPUs with recent vLLM versions, including 0.20.0 and later, during engine startup or CUDA graph profiling. This is a recurring issue, and users have reported that both the latest and nightly vLLM Docker images fail to serve DeepSeek-OCR-2 on H20, with logs showing worker processes dying unexpectedly and no clear root cause in the traceback. There is currently no confirmed workaround or fix in the official documentation or issues for vLLM 0.23.0 on H20 for this model. See the discussion and logs in this issue.

Would you like more detail on possible troubleshooting steps or alternative configurations?

Sources:

https://github.com/vllm-project/vllm/issues/41468

Hint: Mention @Herald in the post for followups.

Topic		Replies	Views
How to run Deep Seek OCR 2 in vllm DeepSeek	1	1277	January 27, 2026
Vllm是否支持加载deepseek_ocr模型的lora适配器进行推理 General	2	110	December 30, 2025
Vllm-ascend是否支持deepseek-ocr DeepSeek	2	365	October 21, 2025
The vllm/vllm-openai version 0.9.1 is nearly 30% faster compared to lmsysorg/sglang:v0.4.7.post, but it stops running every two to three hours DeepSeek	0	196	June 23, 2025
The latest version of vllm is not compatible with local deployment of deepseek-v4（0.20） DeepSeek	2	525	April 29, 2026

How to run DeepSeek OCR 2 on vllm 0.23.0?

Related topics