Running vLLM multi-node data parallel with SLURM

mackereldream · August 12, 2025, 3:23pm

Hello

I understand the preferred method for multi-node inference is using a ray server. I’ve been trying to manage a training run with vLLM as an inference server on a SLURM cluster.

I’ve aiming for something like your Data Parallel Deployment tutorial, modifying this setup script into an sbatch script and something like this for my vLLM server.

I haven’t had much luck yet and seem to be running into problems with resource conflicts between the ray server and vLLM server that are run as separate srun commands. I’m not quite sure how I can avoid these conflicts whilst making sure they can still talk to each other.

Just wondering if there are any existing recipes/approaches for this setup?

Any pointers would be greatly appreciated

RunLLM · August 12, 2025, 3:23pm

The recommended approach for multi-node vLLM inference on SLURM is to launch a Ray cluster across your nodes, then start vLLM within that Ray cluster. Resource conflicts often occur if Ray and vLLM are started as separate jobs or if both try to claim the same GPUs. The best practice is to allocate all resources to a single Ray cluster (using SLURM), then launch vLLM as a Ray task within that allocation, ensuring Ray manages all GPU resources and vLLM does not compete for them separately. See the official distributed serving guide and parallelism scaling guide for details.

A typical recipe is: (1) allocate all nodes/GPUs in one SLURM job, (2) start Ray head on one node and Ray workers on others (all within the same job), (3) launch vLLM from within the Ray cluster, not as a separate SLURM job. This avoids resource conflicts. Also, ensure all nodes have the same model files and environment. For a working example, see the run_cluster.sh script and related SLURM usage in this issue. Would you like a step-by-step example SLURM script?

Sources:

Topic		Replies	Views
Command lines to run pp=2 on two nodes in vllm 0.10.x General	2	51	November 20, 2025
Connecting to Ray Cluster and Launching vLLM with TP=8 and PP=2 in Deepseek R1 Mode on Kubernetes General	1	291	July 6, 2025
Multi-node K8s GPU pooling Features	3	362	July 17, 2025
How to use data-parallel in vLLM General	3	89	November 24, 2025
Deploying multi-node LLM with infiband/ROCE General	11	830	August 15, 2025

Running vLLM multi-node data parallel with SLURM

Related topics