top | item 47210647

(no title)

ibeckermayer | 11 hours ago

Good thought... I think you're wrong because the dominant factor is bandwidth over the interconnect. In this case they're using 5Gbps over Ethernet; compare that to 80-120 Gbps for a Thunderbolt 5 connected Mac Studio cluster: https://www.youtube.com/watch?v=bFgTxr5yst0

discuss

order

fc417fc802|10 hours ago

> I think you're wrong because the dominant factor is bandwidth over the interconnect.

Is it? Why do you say that? I understand inference to be almost entirely bottlenecked on memory bandwidth.

There are n^2 weights per layer but only n state values in the vector that exists between layers. Transmitting a few thousand (or even tens of thousands) of fp values does not require a notable amount of bandwidth by modern standards.

Training is an entirely different beast of course. And depending on the workload latency can also impact performance. But for running inference with a single query from a single user I don't see how inter-node bandwidth is going to matter.