top | item 41205686

(no title)

trws | 1 year ago

Some computation tasks can tolerate the latency if they’re written with enough overlap and can keep enough of the data resident, but they usually need more performant networking than this. See older efforts like rcuda for remote cuda over infiniband as an example. It’s not ideal, but sometimes worth it. Usually the win is in taking a multi-GPU app and giving it 16 or 32 of them rather than a single remote GPU though.

discuss

No comments yet.