Command lines to run pp=2 on two nodes in vllm 0.10.x

Hi,
I am trying to run PP=2, tp=2 on two nodes using vllm 0.10.2, I read some articles that I can use ray to manage the cluster, however, it is also possible to run this using native vllm without ray.

I tried a few command lines, like ‘master-addr’, ‘master-port’ or ‘entrypoints.controller’ etc, but it tells that the parameters are not recognized.

I have two questions:
#1: is it supported to run PP=2 without using ray on vllm 0.10.x?

#2: what are the correct parameters to run if yes? any document to the usage/command line?

Appreciate any comments.

  1. Running pipeline parallelism (PP > 1) across multiple nodes without Ray is not supported in vLLM 0.10.x. Multi-node distributed inference currently requires Ray as the runtime engine; native multiprocessing is only supported for single-node deployments. Parameters like master-addr, master-port, or entrypoints.controller are not recognized by vLLM and are not documented in the official sources for multi-node use without Ray. See distributed serving docs and parallelism scaling.

  2. For multi-node PP/TP, you must use Ray. The correct approach is to set up a Ray cluster (using the provided run_cluster.sh or KubeRay), then launch vLLM with --tensor-parallel-size and --pipeline-parallel-size as needed. There is no support for launching multi-node PP/TP with torchrun or similar native PyTorch methods in vLLM 0.10.x. See parallelism scaling documentation and github discussion.

Would you like more detail on how to set up Ray for multi-node vLLM?

Sources:

Hint: Mention @RunLLM in the post for followups.

yes, please detail on how to set up Ray for multi-node vLLM