(no title)
poorlyknit | 2 years ago
> Currently, only NVIDIA v100 GPU simulation is supported, and all GPUs mentioned later in this post are simulated v100s.
So unless they can reproduce this on a real cluster cluster of V100s, it should probably be taken with a grain of salt. What I'm missing in the simulation is an accurate account of how the host-device barrier is sped up using this system. Reading the headline I was hoping they'd be using NVIDIA GPUDirect for accessing the storage directly from the devices but I believe you'd need very custom CUDA for that...
No comments yet.