I have 2 5090 on Epyc. Latest vllm shows this:
ustom allreduce is disabled because your platform lacks GPU P2P capability or P2P test failed. To silence this warning, specify disable_custom_all_reduce=True explicitly.
does this mean I run something wrong, and does this slow down inference? is this because of blackwell?
The warning “Custom allreduce is disabled because your platform lacks GPU P2P capability or P2P test failed” means vLLM could not enable its custom allreduce kernel, likely due to failed peer-to-peer (P2P) GPU communication checks. This is not necessarily an error, but it means vLLM will fall back to using NCCL for communication, which may be slightly slower in some multi-GPU scenarios, especially for tensor parallelism, but inference will still work correctly. This issue is not specific to Blackwell GPUs; it can also occur due to system configuration, driver, or platform limitations, especially on certain PCIe topologies or with IOMMU/ACS enabled on Epyc systems. See here and here.
To suppress the warning, you can add --disable-custom-all-reduce to your vLLM command. If you want to maximize performance, you may need to check your system’s PCIe topology, BIOS settings (disable IOMMU/ACS if possible), and ensure the latest NVIDIA drivers and NCCL are installed. For more details, see the vLLM troubleshooting guide.
Would you like a step-by-step guide to diagnose or improve P2P support?
To diagnose and fix GPU P2P (peer-to-peer) issues on your setup (Epyc Siena, 2x 5090, Proxmox, Ubuntu 24.04 VM), follow these steps:
Check PCIe Topology: Use nvidia-smi topo -m inside your VM to see if the GPUs are directly connected (look for PIX or NV# between GPUs). If you see PHB, NODE, or SYS, P2P may not be supported or is routed through the CPU, which is slow (reference).
Verify P2P in PyTorch: Run this Python snippet to check P2P support:
If it prints False, P2P is not available (reference).
Check Proxmox/VM Settings: For P2P to work, GPUs must be passed through as full PCIe devices (not vGPU) and IOMMU/ACS should be disabled in BIOS. Enable UVM (Unified Memory) in Proxmox and ensure /dev/shm is mounted in the VM (reference).
Driver and NCCL Version: Use the latest NVIDIA driver and NCCL (≥2.27.3) in your VM (reference).
Disable IOMMU/ACS: In your Proxmox host BIOS, disable IOMMU/ACS if possible, as these can break P2P (reference).
Test with NCCL: Run the NCCL test suite or use the vLLM troubleshooting script to confirm multi-GPU communication (reference).
Would you like a detailed step-by-step for any of these points?