Understanding Multi Node Parallelization

RunLLM · May 13, 2026, 9:08am

To serve a large model across multiple servers (multi-node, multi-GPU) with vLLM, follow these steps:

1. Prepare the environment on all nodes:

Ensure each node has the same Docker image, Python environment, and the model files at the same path (use a shared filesystem or copy the model to each node)
Recommended: use Docker for consistency[^1^][^2^]

2. Start a Ray cluster (recommended backend):

On the head node, run:

bash run_cluster.sh \
    vllm/vllm-openai \
    <HEAD_NODE_IP> \
    --head \
    /path/to/huggingface/home/on/this/node \
    -e VLLM_HOST_IP=<HEAD_NODE_IP>

On each worker node, run:

bash run_cluster.sh \
    vllm/vllm-openai \
    <HEAD_NODE_IP> \
    --worker \
    /path/to/huggingface/home/on/this/node \
    -e VLLM_HOST_IP=<WORKER_NODE_IP>

Keep these shells open; closing them will terminate the cluster[^1^][^2^][^3^].

3. Launch vLLM distributed serving:

Enter a container on any node (e.g., docker exec -it <container_name> /bin/bash)
Run vLLM with parallelism flags. For example, for 2 nodes with 8 GPUs each:

vllm serve /path/to/model \
    --tensor-parallel-size 8 \
    --pipeline-parallel-size 2 \
    --distributed-executor-backend ray

Adjust --tensor-parallel-size and --pipeline-parallel-size to match your cluster[^1^][^2^][^3^].

Would you like a Docker Compose example or more details on networking and troubleshooting?

Sources:

Hint: Mention @RunLLM in the post for followups.

Topic		Replies	Views
Run multiple models General	7	10610	July 28, 2025
"vllm serve" to use all GPU's of a single-node multi-use General	5	939	August 1, 2025
Running vLLM multi-node data parallel with SLURM RL Integration	1	1109	August 12, 2025
How to serve two vLLM instance using docker? General	3	507	February 26, 2026
Does vLLM support multiple model_executor? Scheduling	1	367	April 28, 2025

Understanding Multi Node Parallelization

Related topics