First a simplified description of what happens in the program. Most points are identical for the server and the client. Initialize InfiniBand Context (Structures needed for communication and memory) ...
It has been optimized to achieve high bandwidth on platforms using PCIe, NVLink, NVswitch, as well as networking using InfiniBand Verbs or TCP/IP sockets. NCCL supports an arbitrary number of GPUs ...