when I run deepseek-mtp, drft model with pp=1,tp=8, my terminal just got hang
DeepSeek MTP (Multi-Token Prediction) draft models currently do not support pipeline parallelism (PP) greater than 1; only PP=1 is supported for the draft model. If you set the draft model with pp=1 and tp=8, this is the correct configuration, but if you attempt to use pp>1 for the draft model, vLLM will hang or error, as pipeline parallelism for the draft model is not implemented yet. This limitation is explicitly mentioned in the vLLM discussions and PRs, and users have reported hangs or errors when trying to use PP>1 for the draft model, even though TP>1 is supported and tested for DeepSeek MTP draft modules. For the target model, pipeline parallelism is supported, but the draft (MTP) model must remain at pp=1 for now. See the discussion in vllm PR #15173 and vllm PR #12755.
Would you like more detail on how to configure DeepSeek MTP with vLLM for multi-GPU or multi-node setups?
Sources: