How to use data-parallel in vLLM

As title, I want to know how to use data-parallel in vLLM.

To use data parallelism in vLLM, you can launch vLLM with the --data-parallel-size argument for online serving, or use the --dp-size argument in offline inference scripts. For multi-node setups, specify --data-parallel-address, --data-parallel-rpc-port, and --data-parallel-size-local to coordinate ranks across nodes. Example for online serving:

vllm serve $MODEL --data-parallel-size 4

For multi-node:

# Node 0
vllm serve $MODEL --data-parallel-size 4 --data-parallel-size-local 2 --data-parallel-address <HEAD_NODE_IP> --data-parallel-rpc-port 13345
# Node 1
vllm serve $MODEL --headless --data-parallel-size 4 --data-parallel-size-local 2 --data-parallel-start-rank 2 --data-parallel-address <HEAD_NODE_IP> --data-parallel-rpc-port 13345

See Data Parallel Deployment for full details.

Would you like more detail or code examples?

Sources:

Hint: Mention @RunLLM in the post for followups.

I want know how to use data-parallel not for deployment, but for offline inference.

1 Like

@machine You can refer to this document to enable data-parallelism for offline inference: Data Parallel — vLLM

1 Like