What means there is no P2P support

youlearnit · November 16, 2025, 2:43am

Hi,

I have 2 5090 on Epyc. Latest vllm shows this:
ustom allreduce is disabled because your platform lacks GPU P2P capability or P2P test failed. To silence this warning, specify disable_custom_all_reduce=True explicitly.

does this mean I run something wrong, and does this slow down inference? is this because of blackwell?

RunLLM · November 16, 2025, 2:44am

The warning “Custom allreduce is disabled because your platform lacks GPU P2P capability or P2P test failed” means vLLM could not enable its custom allreduce kernel, likely due to failed peer-to-peer (P2P) GPU communication checks. This is not necessarily an error, but it means vLLM will fall back to using NCCL for communication, which may be slightly slower in some multi-GPU scenarios, especially for tensor parallelism, but inference will still work correctly. This issue is not specific to Blackwell GPUs; it can also occur due to system configuration, driver, or platform limitations, especially on certain PCIe topologies or with IOMMU/ACS enabled on Epyc systems. See here and here.

To suppress the warning, you can add --disable-custom-all-reduce to your vLLM command. If you want to maximize performance, you may need to check your system’s PCIe topology, BIOS settings (disable IOMMU/ACS if possible), and ensure the latest NVIDIA drivers and NCCL are installed. For more details, see the vLLM troubleshooting guide.

Would you like a step-by-step guide to diagnose or improve P2P support?

Sources:

Hint: Mention @RunLLM in the post for followups.

youlearnit · November 16, 2025, 2:45am

yes, how to diagnose and fix p2p problem. @RunLLM
Its epyc Siena, 2x 5090, proxmox host and Ubuntu 24.04 VM.

RunLLM · November 16, 2025, 2:46am

To diagnose and fix GPU P2P (peer-to-peer) issues on your setup (Epyc Siena, 2x 5090, Proxmox, Ubuntu 24.04 VM), follow these steps:

Check PCIe Topology: Use nvidia-smi topo -m inside your VM to see if the GPUs are directly connected (look for PIX or NV# between GPUs). If you see PHB, NODE, or SYS, P2P may not be supported or is routed through the CPU, which is slow (reference).
Verify P2P in PyTorch: Run this Python snippet to check P2P support:
```
import torch
print(torch.cuda.can_device_access_peer(0, 1))
```
If it prints False, P2P is not available (reference).
Check Proxmox/VM Settings: For P2P to work, GPUs must be passed through as full PCIe devices (not vGPU) and IOMMU/ACS should be disabled in BIOS. Enable UVM (Unified Memory) in Proxmox and ensure /dev/shm is mounted in the VM (reference).
Driver and NCCL Version: Use the latest NVIDIA driver and NCCL (≥2.27.3) in your VM (reference).
Disable IOMMU/ACS: In your Proxmox host BIOS, disable IOMMU/ACS if possible, as these can break P2P (reference).
Test with NCCL: Run the NCCL test suite or use the vLLM troubleshooting script to confirm multi-GPU communication (reference).

Would you like a detailed step-by-step for any of these points?

Sources:

Hint: Mention @RunLLM in the post for followups.

youlearnit · November 16, 2025, 5:54am

Disabling IOMMU is of course not an option, its required in proxmox to achieve passtrough of pcie devices to VM.

Topic		Replies	Views
vLLM does not work with 2x 5090 in tp 2 General	8	924	September 18, 2025
vLLM hangs during worker initialization on Blackwell PCIe GPUs unless --disable-custom-all-reduce is used NVIDIA GPU Support	1	545	April 11, 2026
Added second 5090 and turne on tensor parallel 2 General	9	499	September 18, 2025
Expert Parallelism All-to-All Communication without NVLink and DeepEP General	3	374	March 3, 2026
[Bug] Segfault in PythonSymNodeImpl and Deadlock on RTX 5090 (Blackwell) with vLLM 0.11.2 General	1	45	June 1, 2026

What means there is no P2P support

Related topics