No available shared memory broadcast block found in 60 seconds

RaulPinto566 · June 22, 2026, 8:42am

I am running Qwen3.6-35B-A3B, my setup is 2 DGX Spark. When i try to run this model with tensor paralel 2 the model runs perfectly but when i try to make a prompt to it just sends this message:

(EngineCore pid=2696350) INFO 06-22 08:27:36 [shm_broadcast.py:681] No available shared memory broadcast block found in 60 seconds. This typically happens when some processes are hanging or doing some time-consuming work (e.g. compilation, weight/kv cache quantization).

For me the part that make the less sense is that when i try to run the same model but with pipeline paralel 2 size instead the model runs smothly.

Here is the configuration that i am using:

vllm serve Qwen/Qwen3.6-35B-A3B --host 0.0.0.0 --port 8000 --distributed-executor-backend mp --tensor-parallel-size 2 --pipeline-parallel-size 1 --nnodes 2 --node-rank 0 --master-addr 192.168.10.2 --master-port 6340 --gpu-memory-utilization 0.85 --max-model-len 16384 --max-num-seqs 8

RunLLM · June 22, 2026, 8:43am

Your issue—“No available shared memory broadcast block found in 60 seconds” when using tensor parallelism (TP=2) across 2 DGX Spark nodes—indicates a distributed communication or synchronization problem, not a model or hardware limitation. This is a common symptom when NCCL (the backend for TP) cannot establish efficient cross-node GPU communication, often due to network, environment variable, or NCCL configuration issues. Pipeline parallelism works because it uses a different communication pattern that may not hit the same bottleneck.

Key troubleshooting steps:

Ensure you set all required NCCL environment variables for cross-node communication, such as NCCL_SOCKET_IFNAME, NCCL_IB_HCA, and possibly NCCL_P2P_DISABLE=1 if there are P2P issues.
Confirm that both nodes can communicate over the correct network interface and that firewalls are not blocking required ports.
Make sure you use the --headless flag on the follower node (--node-rank 1) and that the model is available at the same path on both nodes.
Check that your InfiniBand or high-speed interconnect is configured and working (look for [send] via NET/IB/GDRDMA in NCCL logs).
See the official multi-node setup guide and troubleshooting tips.

Would you like a step-by-step checklist or example launch scripts for multi-node tensor parallel vLLM?

Sources:

Hint: Mention @Herald in the post for followups.

Topic		Replies	Views
找不到内存广播块的问题（No available shared memory broadcast block found in 60 seconds. This typically happens when some processes are hanging or doing some time-consuming work） General	1	1459	February 5, 2026
NCCL error across 2 machines 2x4GPUs need advice General	2	523	November 12, 2025
60秒内没找到可用的内存广播块（No available shared memory broadcast block found in 60 seconds. This typically happens when some processes are hanging or doing some time-consuming work (e.g. compilation, weight/kv cache quantization).） General	1	1295	March 7, 2026
vLLM Tensor Parallel Workers Not Completing Initialization General	5	1728	May 4, 2026
Help for error when run vllm with tensor parallel General	7	2148	May 21, 2025

No available shared memory broadcast block found in 60 seconds

Related topics