(no title)
calaphos | 3 months ago
For example the currently very popular Mixture of Experts architectures require a lot of all to all traffic (for expert parallelism) which works a lot better on the switched NVlink fabric as opposed where it doesn't need to traverse multiple links in the torus.
zamadatix|3 months ago
markhahn|3 months ago
Bisection bandwidth is a useful metric, but is hop count? Per-hop cost tends to be pretty small.
zamadatix|3 months ago