Does vllm support draft model use tp>1 when I use speculative decoding

Rus-P · July 29, 2025, 5:52am

when I run deepseek-mtp, drft model with pp=1,tp=8, my terminal just got hang

RunLLM · July 29, 2025, 5:52am

DeepSeek MTP (Multi-Token Prediction) draft models currently do not support pipeline parallelism (PP) greater than 1; only PP=1 is supported for the draft model. If you set the draft model with pp=1 and tp=8, this is the correct configuration, but if you attempt to use pp>1 for the draft model, vLLM will hang or error, as pipeline parallelism for the draft model is not implemented yet. This limitation is explicitly mentioned in the vLLM discussions and PRs, and users have reported hangs or errors when trying to use PP>1 for the draft model, even though TP>1 is supported and tested for DeepSeek MTP draft modules. For the target model, pipeline parallelism is supported, but the draft (MTP) model must remain at pp=1 for now. See the discussion in vllm PR #15173 and vllm PR #12755.

Would you like more detail on how to configure DeepSeek MTP with vLLM for multi-GPU or multi-node setups?

Sources:

Topic		Replies	Views
Speculative decoding is currently incompatible with pipeline parallelism General	3	1001	July 8, 2025
Cannot handle cases where distributed draft workers generate no tokens General	4	113	July 24, 2025
GLM 5.1 PP support DeepSeek	1	83	May 9, 2026
Qwen3.5-27B-FP8 Speculative Decoding Speculative Decoding	2	2017	April 11, 2026
Standalone draft model spec decode support in v0.x and v1 Speculative Decoding	3	180	January 20, 2026

Does vllm support draft model use tp>1 when I use speculative decoding

Related topics