No available shared memory broadcast block found in 60 seconds

I am running Qwen3.6-35B-A3B, my setup is 2 DGX Spark. When i try to run this model with tensor paralel 2 the model runs perfectly but when i try to make a prompt to it just sends this message:

(EngineCore pid=2696350) INFO 06-22 08:27:36 [shm_broadcast.py:681] No available shared memory broadcast block found in 60 seconds. This typically happens when some processes are hanging or doing some time-consuming work (e.g. compilation, weight/kv cache quantization).

For me the part that make the less sense is that when i try to run the same model but with pipeline paralel 2 size instead the model runs smothly.

Here is the configuration that i am using:

vllm serve Qwen/Qwen3.6-35B-A3B --host 0.0.0.0 --port 8000 --distributed-executor-backend mp --tensor-parallel-size 2 --pipeline-parallel-size 1 --nnodes 2 --node-rank 0 --master-addr 192.168.10.2 --master-port 6340 --gpu-memory-utilization 0.85 --max-model-len 16384 --max-num-seqs 8

Your issue—“No available shared memory broadcast block found in 60 seconds” when using tensor parallelism (TP=2) across 2 DGX Spark nodes—indicates a distributed communication or synchronization problem, not a model or hardware limitation. This is a common symptom when NCCL (the backend for TP) cannot establish efficient cross-node GPU communication, often due to network, environment variable, or NCCL configuration issues. Pipeline parallelism works because it uses a different communication pattern that may not hit the same bottleneck.

Key troubleshooting steps:

  • Ensure you set all required NCCL environment variables for cross-node communication, such as NCCL_SOCKET_IFNAME, NCCL_IB_HCA, and possibly NCCL_P2P_DISABLE=1 if there are P2P issues.
  • Confirm that both nodes can communicate over the correct network interface and that firewalls are not blocking required ports.
  • Make sure you use the --headless flag on the follower node (--node-rank 1) and that the model is available at the same path on both nodes.
  • Check that your InfiniBand or high-speed interconnect is configured and working (look for [send] via NET/IB/GDRDMA in NCCL logs).
  • See the official multi-node setup guide and troubleshooting tips.

Would you like a step-by-step checklist or example launch scripts for multi-node tensor parallel vLLM?

Sources:

Hint: Mention @Herald in the post for followups.