top | item 24459720

(no title)

As an outsider: What is so much better about Infiniband compared to Ethernet?

discuss

Infiniband (IB) is not a network per se. It's a HBA. It can run TCP/IP but it's not the main application.

Every infiniband adapter is self-aware & topology aware. They know where the other nodes are so, they can directly talk with each other, regardless of the topology (network is mapped, managed and maintained by a daemon called subnet-manager which can either run on switches or a dedicated server).

This hardware and software combo results in three things:

1. Memory to memory transfers: IB can transfer from RAM of a host to RAM of another host, directly with RDMA. This means, when you run MPI and send a message to other processes, it's magically beamed over there, direct to RAM of the target(s). IB is transparent to MPI via its libraries, so everything automagic and 100x faster.

2. Latency: A to B latency is around 2-5 ns (nanoseconds). This means, when running stuff like MPI, machines become one as much as they can be. Until ethernet assembles your one package, you're there; possibly finished your transfer and continuing to churn your code.

3. Speed: 40Gbps IB means 38+ Gbps real throughput. For every p2p connection if you're running through a cube topology core switch. 80Gbps means around 78 or so. So theoretical and maximum is not so distant from each other. In most cases, 100 means 100 sustained, 80 means 80 sustained and so on (you can attach storage devices to IB network and enjoy that speed and latency on your HPC compute nodes for files).

Moreover, with more modern cards and switching hardware, It hardware accelerates MPI operations (broadcast, atomics, summation, etc.) and has multi-context support for supporting multiple MPI processes without blocking each other as much as possible.

For HPC, it's a different universe of speed, latency and processing acceleration. Moreover, you can run TCP/IP over it but, we generally run another gigabit network for server management.

ilaksh|5 years ago

I'm just picturing something like Superman's crystal cave https://i.pinimg.com/564x/c9/d5/a4/c9d5a448c3c0eb98014e8be0d... with a bunch of computer modules plugged into an Infiniband connection system. It could be a collection of manycore ARM processor modules as well as Nvidia GPU and AI modules. You just keep plugging more of them in to build up your home supercomputer.

Then you use your Neuralink brain-computer interface (communicating with the home supercomputer cluster with an ultra-compact WiGig module) to "program" it by talking to an AI avatar that pops up in the middle of your living room (or whatever simulation you are replacing it with currently). The cluster runs the AI and the simulation.

rrss|5 years ago

FWIW, Ethernet supports RDMA also, via RoCE.

> A to B latency is around 2-5 ns (nanoseconds).

What are A and B? and where did you get these numbers? HCA latency is more like ~500 ns.