top | item 44566412

(no title)

zcbenz | 7 months ago

In the absence of hardware unified memory, CUDA will automatically copy data between CPU/GPU when there are page faults.

discuss

fenced_load|7 months ago

There is also NVLink c2c support between Nvidia's CPUs and GPUs that doesn't require any copy, CPUs and GPUs directly access each other's memory over a coherent bus. IIRC, they have 4 CPU + 4 GPU servers already available.

benreesman|7 months ago

Yeah NCCL is a whole world and it's not even the only thing involved, but IIRC that's the difference between 8xH100 PCI and 8xH100 SXM2.

unknown|7 months ago

[deleted]

saagarjha|7 months ago

This seems like it would be slow…

freeone3000|7 months ago

Matches my experience. It’s memory stalls all over the place, aggravated (on 12.3 at least) there wasn’t even a prefetcher.

nickysielicki|7 months ago