Yes, to enable InfiniBand or RoCE in Kubernetes, you typically need the Nvidia/Mellanox network operator to provision SR-IOV or MACVLAN secondary interfaces for RDMA (KubeRay integration). vLLM and Ray can use these secondary interfaces if you set the correct NCCL environment variables (e.g., NCCL_SOCKET_IFNAME to the RDMA interface name, such as ib0 or the SR-IOV device) in your pod spec. RayService is fully applicable; just ensure the environment variables are set in both head and worker pods so NCCL uses the RDMA-enabled interface (example, distributed serving docs).
You must explicitly set NCCL_SOCKET_IFNAME to the RDMA interface name to avoid NCCL defaulting to the wrong interface, especially in multi-homed pods. RayService will orchestrate the cluster as usual; the key is correct pod-level network and environment configuration. Would you like a sample RayService CRD or pod spec for this setup?
Sources: