top | item 35249176

(no title)

neilmovva | 2 years ago

A bit underwhelming - H100 was announced at GTC 2022, and represented a huge stride over A100. But a year later, H100 is still not generally available at any public cloud I can find, and I haven't yet seen ML researchers reporting any use of H100.

The new "NVL" variant adds ~20% more memory per GPU by enabling the sixth HBM stack (previously only five out of six were used). Additionally, GPUs now come in pairs with 600GB/s bandwidth between the paired devices. However, the pair then uses PCIe as the sole interface to the rest of the system. This topology is an interesting hybrid of the previous DGX (put all GPUs onto a unified NVLink graph), and the more traditional PCIe accelerator cards (star topology of PCIe links, host CPU is the root node). Probably not an issue, I think PCIe 5.0 x16 is already fast enough to not bottleneck multi-GPU training too much.

discuss

binarymax|2 years ago

It is interesting that hopper isn’t widely available yet.

I have seen some benchmarks from academia but nothing in the private sector.

I wonder if they thought they were moving too fast and wanted to milk amphere/ada as long as possible.

Not having any competition whatsoever means Nvidia can release what they like when they like.

pixl97|2 years ago

The question is, do they not have much production, or is OpenAI and Microsoft buying every single one they produce?

TylerE|2 years ago

Why bother when you can get cryptobros paying way over MSRP for 3090s?

__anon-2023__|2 years ago

Yes, I was expecting a RAM-doubled edition of the H100, this is just a higher-binned version of the same part.

I got an email from vultr, saying that they're "officially taking reservations for the NVIDIA HGX H100", so I guess all public clouds are going to get those soon.

unknown|2 years ago

[deleted]

rerx|2 years ago

You can also join a pair of regular PCIe H100 GPUs with an NVLink bridge. So that topology is not so new either.

ksec|2 years ago

>H100 was announced at GTC 2022, and represented a huge stride over A100. But a year later, H100 is still not generally available at any public cloud I can find

You can safely assume an entity bought as many as they could.