top | item 44566412 (no title) zcbenz | 7 months ago In the absence of hardware unified memory, CUDA will automatically copy data between CPU/GPU when there are page faults. discuss order hn newest fenced_load|7 months ago There is also NVLink c2c support between Nvidia's CPUs and GPUs that doesn't require any copy, CPUs and GPUs directly access each other's memory over a coherent bus. IIRC, they have 4 CPU + 4 GPU servers already available. benreesman|7 months ago Yeah NCCL is a whole world and it's not even the only thing involved, but IIRC that's the difference between 8xH100 PCI and 8xH100 SXM2. unknown|7 months ago [deleted] saagarjha|7 months ago This seems like it would be slow… freeone3000|7 months ago Matches my experience. It’s memory stalls all over the place, aggravated (on 12.3 at least) there wasn’t even a prefetcher. nickysielicki|7 months ago See also: https://www.kernel.org/doc/html/v5.0/vm/hmm.html
fenced_load|7 months ago There is also NVLink c2c support between Nvidia's CPUs and GPUs that doesn't require any copy, CPUs and GPUs directly access each other's memory over a coherent bus. IIRC, they have 4 CPU + 4 GPU servers already available. benreesman|7 months ago Yeah NCCL is a whole world and it's not even the only thing involved, but IIRC that's the difference between 8xH100 PCI and 8xH100 SXM2.
benreesman|7 months ago Yeah NCCL is a whole world and it's not even the only thing involved, but IIRC that's the difference between 8xH100 PCI and 8xH100 SXM2.
saagarjha|7 months ago This seems like it would be slow… freeone3000|7 months ago Matches my experience. It’s memory stalls all over the place, aggravated (on 12.3 at least) there wasn’t even a prefetcher.
freeone3000|7 months ago Matches my experience. It’s memory stalls all over the place, aggravated (on 12.3 at least) there wasn’t even a prefetcher.
fenced_load|7 months ago
benreesman|7 months ago
unknown|7 months ago
[deleted]
saagarjha|7 months ago
freeone3000|7 months ago
nickysielicki|7 months ago