How to run DeepSeek OCR 2 on vllm 0.23.0?

I just tried to run DeepSeek OCR 2 on vllm 0.23.0 on H20, and here is my running command:

vllm serve DeepSeek-OCR-2 --host 0.0.0.0 --port 8999  --tensor-parallel-size 2 --pipeline-parallel-size 1 --trust-remote-code --served-model-name DeepSeek-OCR-2 --enable-prefix-caching &

But I got:

Profiling CUDA graph memory PIECEWISE=51 (largest=512), FULL=51 (largest=512)

Worker proc VllmWorker-1 died unexpectedly, shutting down executor

Executor: waiting for worker exit count=2
Executor: all workers exited gracefully

DeepSeek-OCR-2 is known to crash on H20 GPUs with recent vLLM versions, including 0.20.0 and later, during engine startup or CUDA graph profiling. This is a recurring issue, and users have reported that both the latest and nightly vLLM Docker images fail to serve DeepSeek-OCR-2 on H20, with logs showing worker processes dying unexpectedly and no clear root cause in the traceback. There is currently no confirmed workaround or fix in the official documentation or issues for vLLM 0.23.0 on H20 for this model. See the discussion and logs in this issue.

Would you like more detail on possible troubleshooting steps or alternative configurations?

Sources:

Hint: Mention @Herald in the post for followups.