It seems likely that you are correct: the RayService CRD must specify GPU resources (e.g., nvidia.com/gpu: 8) in the resources section for both head and worker pods to allocate GPUs (KubeRay docs). For RDMA, the values for NCCL_IB_HCA and NCCL_SOCKET_IFNAME depend on the network interfaces provisioned by the Nvidia Network Operator, which is configured via the NvidiaClusterPolicy (formerly NimClusterPolicy).
- NCCL_IB_HCA: This should be set to the device name of your InfiniBand/RDMA NIC, such as
mlx5_0,mlx5_1, etc. You can find these by runningibv_devicesoribdev2netdevinside a pod with RDMA access, or by inspecting thespec.nicSelectorsection of yourNvidiaClusterPolicy(look fordeviceorrootDevices). - NCCL_SOCKET_IFNAME: This should be set to the network interface name (e.g.,
ib0,ens2f0, etc.) that is mapped to the RDMA device. You can find this by runningip aoribdev2netdevinside the pod, or by checking theNvidiaClusterPolicy’snicSelectorandpfNamesfields.
Example: If your NvidiaClusterPolicy has
nicSelector:
pfNames: ["ens2f0"]
rootDevices: ["0000:3b:00.0"]
then NCCL_SOCKET_IFNAME=ens2f0 and NCCL_IB_HCA=mlx5_0 (if mlx5_0 maps to ens2f0).
You can confirm the mapping with:
ibdev2netdev
which outputs lines like:
mlx5_0 port 1 ==> ens2f0 (Up)
So, set:
NCCL_IB_HCA=mlx5_0NCCL_SOCKET_IFNAME=ens2f0
For more, see Nvidia Network Operator documentation and KubeRay integration. Would you like a full RayService CRD example with GPU and RDMA resources?
Sources: