How to enable the --data-parallel-size function for a single machine?

How to enable the --data-parallel-size for a single machine when using vllm CLI?

1 Like

To enable data parallelism on a single machine with vLLM, use the --data-parallel-size argument in your vllm serve command. For example, to run with 4 data parallel ranks (requiring 4 GPUs), use:

vllm serve $MODEL --data-parallel-size 4

This will launch 4 engine processes, each on a separate GPU, and expose a single API endpoint. You can also combine with tensor parallelism (e.g., --data-parallel-size=4 --tensor-parallel-size=2 for 8 GPUs). For more details and multi-node options, see Data Parallel Deployment and Parallelism and Scaling.

Would you like more detail or example multi-node commands?

Sources:

Hint: Mention @RunLLM in the post for followups.

@qt2139 If you are talking about a single host, you need to use –-data-parallel-size-local

Why do you want to enable data parallelism for a single machine?