Pipeline Parallelism Support - Source Code Location

egbeysel · March 25, 2025, 3:06pm

I was looking to get inspired by how the pipeline parallelism support is implemented in vLLM. Therefore, I was looking for the source code that adjusts the model pipeline to be compatible with pipeline-parallel inference. Any pointers on where the code lives that enables this would be super welcome.

In other words, where does the model get partitioned into pipeline stages when we run vllm serve gpt2 --tensor-parallel-size 4 --pipeline-parallel-size 2? Or does the model already have to be in a compatible form?

comaniac · March 25, 2025, 4:07pm

This function does the layer partitioning and could be your starting point: vllm/vllm/distributed/utils.py at d0cfec7ab919bfab261db419e17c768f08a24dc8 · vllm-project/vllm · GitHub

Topic		Replies	Views
How to load the model successfully through multi-card in vllm? General	5	275	April 3, 2025
Seqence Parallelism Support - Source Code Location Features	0	41	June 10, 2025
Command lines to run pp=2 on two nodes in vllm 0.10.x General	2	69	November 20, 2025
Is it possible to configure the order of the pipeline in multi-node deployments? Features	3	142	October 16, 2025
Setting up VllmConfig for custom gpt2 General	2	138	April 29, 2025

Pipeline Parallelism Support - Source Code Location

Related topics