As title, I want to know how to use data-parallel in vLLM.
To use data parallelism in vLLM, you can launch vLLM with the --data-parallel-size argument for online serving, or use the --dp-size argument in offline inference scripts. For multi-node setups, specify --data-parallel-address, --data-parallel-rpc-port, and --data-parallel-size-local to coordinate ranks across nodes. Example for online serving:
vllm serve $MODEL --data-parallel-size 4
For multi-node:
# Node 0
vllm serve $MODEL --data-parallel-size 4 --data-parallel-size-local 2 --data-parallel-address <HEAD_NODE_IP> --data-parallel-rpc-port 13345
# Node 1
vllm serve $MODEL --headless --data-parallel-size 4 --data-parallel-size-local 2 --data-parallel-start-rank 2 --data-parallel-address <HEAD_NODE_IP> --data-parallel-rpc-port 13345
See Data Parallel Deployment for full details.
Would you like more detail or code examples?
Sources:
Hint: Mention @RunLLM in the post for followups.
I want know how to use data-parallel not for deployment, but for offline inference.
1 Like
@machine You can refer to this document to enable data-parallelism for offline inference: Data Parallel — vLLM
1 Like