Is it possible to configure the order of the pipeline in multi-node deployments?

From what i understand, then this environment variable: VLLM_PP_LAYER_PARTITION manually decides the layer distribution among the nodes in the pipeline. e.g., node 0: 20, node 1: 10, node 2: 20. However, how is the order of pipelined nodes decided and is this something that can be configured?

Yes, the VLLM_PP_LAYER_PARTITION environment variable lets you manually specify how many layers each pipeline parallel (PP) stage (node) gets, e.g., VLLM_PP_LAYER_PARTITION=20,10,20 for three stages. However, the order of pipeline stages (i.e., which node gets which partition) is determined by the order in which vLLM assigns pipeline ranks to nodes, not by user configuration. There is no documented way to directly control or configure the mapping/order of pipeline ranks to specific nodes—vLLM handles this internally based on Ray’s resource allocation and the order nodes join the cluster.

If you have nodes with different memory sizes and want to ensure a specific node gets a specific partition, you must ensure the node order in the Ray cluster matches your intended partitioning, but vLLM does not provide a direct configuration for this mapping. This limitation is noted in user discussions and issues, and the only user-exposed control is the partition list itself, not the node-to-partition mapping order. See VLLM_PP_LAYER_PARTITION docs and related issue discussion.

Would you like more detail on how Ray assigns resources or how to work around this limitation?

Sources:

Hint: Mention @RunLLM in the post for followups.

how is pipeline rank assigned to nodes? Is it just the order of which nodes connect to the head? @RunLLM

Pipeline ranks are assigned to nodes based on the order in which Ray allocates workers and resources, not strictly the order nodes connect to the head. vLLM sorts workers so that those on the same node as the driver (head) are placed first, then by nodes with fewer workers, and finally by IP address order. This means the mapping is deterministic but depends on Ray’s scheduling and the network/IP configuration, not on user-specified node order or connection timing. There is no user-facing option to directly control which node gets which pipeline rank—it’s handled internally by vLLM and Ray’s resource manager. See ray_distributed_executor.py and API docs.

Would you like a step-by-step breakdown of the worker sorting and assignment logic?

Sources:

Hint: Mention @RunLLM in the post for followups.