How to enable the --data-parallel-size for a single machine when using vllm CLI?
To enable data parallelism on a single machine with vLLM, use the --data-parallel-size argument in your vllm serve command. For example, to run with 4 data parallel ranks (requiring 4 GPUs), use:
vllm serve $MODEL --data-parallel-size 4
This will launch 4 engine processes, each on a separate GPU, and expose a single API endpoint. You can also combine with tensor parallelism (e.g., --data-parallel-size=4 --tensor-parallel-size=2 for 8 GPUs). For more details and multi-node options, see Data Parallel Deployment and Parallelism and Scaling.
Would you like more detail or example multi-node commands?
Sources:
Hint: Mention @RunLLM in the post for followups.
@qt2139 If you are talking about a single host, you need to use –-data-parallel-size-local
Why do you want to enable data parallelism for a single machine?