top | item 40147836

(no title)

jfkfif | 1 year ago

the problem is multinode runs that communicate through the network

discuss

order

freeone3000|1 year ago

Multinode runs don’t communicate through the network in a DGX configuration. NVlink allows for RDMA over direct infiniband. No need for network here.

tomoyoirl|1 year ago

Infiniband is a network too…

But even if we set that aside you’ll get access to your data over a network connection because these are expensive nodes running batch jobs with finite disk space, not personal workstations.

josh-sematic|1 year ago

Yes, which is especially important for training. Getting good GPU interconnect can be really important for training large models.