Is it possible to configure the order of the pipeline in multi-node deployments?

JorgenTrondsen · October 16, 2025, 6:55pm

From what i understand, then this environment variable: VLLM_PP_LAYER_PARTITION manually decides the layer distribution among the nodes in the pipeline. e.g., node 0: 20, node 1: 10, node 2: 20. However, how is the order of pipelined nodes decided and is this something that can be configured?

RunLLM · October 16, 2025, 6:55pm

Yes, the VLLM_PP_LAYER_PARTITION environment variable lets you manually specify how many layers each pipeline parallel (PP) stage (node) gets, e.g., VLLM_PP_LAYER_PARTITION=20,10,20 for three stages. However, the order of pipeline stages (i.e., which node gets which partition) is determined by the order in which vLLM assigns pipeline ranks to nodes, not by user configuration. There is no documented way to directly control or configure the mapping/order of pipeline ranks to specific nodes—vLLM handles this internally based on Ray’s resource allocation and the order nodes join the cluster.

If you have nodes with different memory sizes and want to ensure a specific node gets a specific partition, you must ensure the node order in the Ray cluster matches your intended partitioning, but vLLM does not provide a direct configuration for this mapping. This limitation is noted in user discussions and issues, and the only user-exposed control is the partition list itself, not the node-to-partition mapping order. See VLLM_PP_LAYER_PARTITION docs and related issue discussion.

Would you like more detail on how Ray assigns resources or how to work around this limitation?

Sources:

VLLM_PP_LAYER_PARTITION docs

Hint: Mention @RunLLM in the post for followups.

JorgenTrondsen · October 16, 2025, 7:19pm

how is pipeline rank assigned to nodes? Is it just the order of which nodes connect to the head? @RunLLM

RunLLM · October 16, 2025, 7:20pm

Pipeline ranks are assigned to nodes based on the order in which Ray allocates workers and resources, not strictly the order nodes connect to the head. vLLM sorts workers so that those on the same node as the driver (head) are placed first, then by nodes with fewer workers, and finally by IP address order. This means the mapping is deterministic but depends on Ray’s scheduling and the network/IP configuration, not on user-specified node order or connection timing. There is no user-facing option to directly control which node gets which pipeline rank—it’s handled internally by vLLM and Ray’s resource manager. See ray_distributed_executor.py and API docs.

Would you like a step-by-step breakdown of the worker sorting and assignment logic?

Sources:

Hint: Mention @RunLLM in the post for followups.

Topic		Replies	Views
Command lines to run pp=2 on two nodes in vllm 0.10.x General	2	75	November 20, 2025
Pipeline Parallelism Support - Source Code Location Features	1	151	March 25, 2025
vLLM on 4 nodes fails randomly General	1	70	April 6, 2026
Running vLLM multi-node data parallel with SLURM RL Integration	1	1048	August 12, 2025
Connecting to Ray Cluster and Launching vLLM with TP=8 and PP=2 in Deepseek R1 Mode on Kubernetes General	1	337	July 6, 2025

Is it possible to configure the order of the pipeline in multi-node deployments?

Related topics