Fascinating, despite the significantly better specs (and VRAM) on the AMD MI300x, the Nvidia H100 seems to match performance at lower batch sizes, and only loses out slightly at larger batches, I'm guessing the differentiator is mostly VRAM (192 GB in MI300 vs 80 GB in the Nvidia chip.)
Does anyone know if this is just due to ROCm vs CUDA implementations? Or something else?
I expect that the AMD also looses out when multigpu starts to be required for it (which is arguably going to be for much larger models than for the h100, but a 70B parameter model with bf16 training is going to hit multigpu in terms of memory requirements) as their interconnect is just way slower.
samspenc|1 year ago
Does anyone know if this is just due to ROCm vs CUDA implementations? Or something else?
spott|1 year ago
unknown|1 year ago
[deleted]
unknown|1 year ago
[deleted]
gjsgdj|1 year ago
[deleted]