A bit underwhelming - H100 was announced at GTC 2022, and represented a huge stride over A100. But a year later, H100 is still not generally available at any public cloud I can find, and I haven't yet seen ML researchers reporting any use of H100.
The new "NVL" variant adds ~20% more memory per GPU by enabling the sixth HBM stack (previously only five out of six were used). Additionally, GPUs now come in pairs with 600GB/s bandwidth between the paired devices. However, the pair then uses PCIe as the sole interface to the rest of the system. This topology is an interesting hybrid of the previous DGX (put all GPUs onto a unified NVLink graph), and the more traditional PCIe accelerator cards (star topology of PCIe links, host CPU is the root node). Probably not an issue, I think PCIe 5.0 x16 is already fast enough to not bottleneck multi-GPU training too much.
Yes, I was expecting a RAM-doubled edition of the H100, this is just a higher-binned version of the same part.
I got an email from vultr, saying that they're "officially taking reservations for the NVIDIA HGX H100", so I guess all public clouds are going to get those soon.
>H100 was announced at GTC 2022, and represented a huge stride over A100. But a year later, H100 is still not generally available at any public cloud I can find
You can safely assume an entity bought as many as they could.
I was wondering today if we would start to see the reverse of this. Small ASICS or some kind of optimized for LLM Gpu for desktop / or maybe even laptops of mobile. It is evident I think that LLM are here to stay and will be a major part of computing for a while. Getting this local, so we aren't reliant on clouds would be a huge boon for personal computing. Even if its a "worse" experience, being able to load up an LLM into our computer, tell it to only look at this directory and help out would be cool.
In fact, Qualcomm has announced a "Cloud AI" PCIe card designed for inference (as opposed to training & inference) [1, 2]. It's populated with NSPs like the ones in mobile SoCs.
Apple, Intel, AMD, Qualcomm, Samsung, etc. already have "neural engines" in their SoCs. These engines continue to evolve to better support common types of models.
Why is the sentiment here so much that LLMs will somehow be decentralized and run locally at some point? Has the story of the internet so far not been that centralization has pretty much always won?
Nvidia don't want consumers using consumer GPUs for business.
If you are a business user then you must pay Nvidia gargantuan amounts of money.
This is the outcome of a market leader with no real competition - you pay much more for lower power than the consumer GPUs and you are forced into ujsing their business GPUs through software license restrictions on the drivers.
Nvidia can't do a large 'consumer' card without cannibalizing their commercial ML business. ATI doesn't have that problem.
ATI seems to be holding the idiot ball.
Port stable diffusion and clip to their hardware. Train an upsized version sized for a 48GB card. Release a prosumer 48gb card... get huge uptake from artists and creators using the tech.
GPUs are going to be weird, underconfigured and overpriced until there is real competition.
Whether or not there is real competition depends entirely on whether Intels Arc line of GPUs stays in the market.
AMD strangely has decided not to compete. Its newest GPU the 7900 XTX is an extremely powerful card, close to the top of the line Nvidia RTX 4090 in raster performance.
If AMD had introduced it with an aggressively low price then then they could have wedged Nvidia, which is determinbed to exploit it's market dominance by squeezing the maximum money out of buyers.
Instead, AMD has decided to simply follow Nvidia in squeezing for maximum prices, with AM prices slightly behind Nvidia.
It's a strange decision from AMD who is well behind in market and apparently seems disinterested in increasing that market share by competing aggressively.
So a third player is needed - Intel - it's alot harder for three companies to sit on outrageously high prices for years rather than compete with each other for market share.
The really interesting upcoming LLM products are from AMD and Intel... with catches.
- The Intel Falcon Shores XPU is basically a big GPU that can use DDR5 DIMMS directly, hence it can fit absolutely enormous models into a single pool. But it has been delayed to 2025 :/
- AMD have not mentioned anything about the (not delayed) MI300 supporting DIMMs. If it doesn't, its capped to 128GB, and its being marketed as an HPC product like the MI200 anyway (which you basically cannot find on cloud services).
Nvidia also has some DDR5 grace CPUs, but the memory is embedded and I'm not sure how much of a GPU they have. Other startups (Tenstorrent, Cerebras, Graphcore and such) seemed to have underestimated the memory requirements of future models.
That's the problem. Good DDR5 RAM's memory speed is <100GB/s, while nvidia could has up to 2TB/s, and still the bottleneck lies on memory speed for most applications.
I wonder how soon we'll see something tailored specifically for local applications. Basically just tons of VRAM to be able to load large models, but not bleeding edge perf. And eGPU form factor, ideally.
The Apple M-series CPUs with unified RAM is interesting in this regard. You can get an 16-inch MBP with an M2 Max 96GB of RAM for $4300 today, and I expect the M2 Ultra go to 192GB.
I'm not a ML scientist my any means, but Perf seems as important as RAM from what I'm reading. Running prompts in internal chain of thought (eating up more TPU time) appears to give much better output.
I'm super duper curious if there are ways to glob together VRAM between consumer-grade hardware to make this whole market more accessible to the common hacker?
I remember reading about a guy who soldered 2GB VRAM modules on his 3060 12GB (replacing the 1GB modules) and was able to attain 24GB on that card. Or something along those lines.
How is this card (which is really two physical cards occupying 2 PCIe slots) exposed to the OS? Does it show up as a single /dev/gfx0 device, or is the unification a driver trick?
What exactly is an SXM5 socket? It sounds like a PCIe competitor, but proprietary to nvidia. Looking at it, it seems specific to nvidia DGX (mother?)boards. Is this just a "better" alternative to PCIe (with power delivery, and such), or fundamentally a new technology?
The TDP row in the comparison table must be in error. It shows the card with dual GH100 GPUs at 700W and the one with a single GH100 GPU at 700-800W ?!
I would sell a kidney for one of these. It's basically impossible to train language models on a consumer 24GB card. The jump up is the A6000 ADA, at 48GB for $8,000. This one will probably be priced somewhere in the $100k+ range.
Use 4 consumer grade 4090 then. It would be much cheaper and better in almost every aspect. Also even with this, forget about training foundational models. Meta spent 82k GPU hours on the smallest llama and 1M hours on largest.
I was just saying to a colleague the day before this announcement that the inevitable consequence of the popularity of large language models will be GPUs with more memory.
Previously, GPUs were designed for gamers, and no game really "needs" more than 16 GB of VRAM. I've seen reviews of the A100 and H100 cards saying that the 80GB is ample for even the most demanding usage.
Now? Suddenly GPUs with 1 TB of memory could be immediately used, at scale, by deep-pocket customers happy to throw their entire wallets at NVIDIA.
This new H100 NVL model is a Frankenstein's monster stitched together from whatever they had lying around. It's a desperate move to corner the market early as possible. It's just the beginning, a preview of the times to come.
There will be a new digital moat, a new capitalist's empire, built upon on the scarcity of cards "big enough" to run models that nobody but a handful of megacorps can afford to train.
In fact, it won't be enough to restrict access by making the models expensive to train. The real moat will be models too expensive to run. Users will have to sign up, get API keys, and stand in line.
"Safe use of AI" my ass. Safe profits, more like. Safe monopolies, safe from competition.
AMD seems to be focusing on traditional HPC, they've got a ton of 64 bit flops in their recent commercial model. I expect their server GPUs are mostly for chasing supercomputer contracts, which can be pretty lucrative, while they cede model training to NVidia.
neilmovva|2 years ago
The new "NVL" variant adds ~20% more memory per GPU by enabling the sixth HBM stack (previously only five out of six were used). Additionally, GPUs now come in pairs with 600GB/s bandwidth between the paired devices. However, the pair then uses PCIe as the sole interface to the rest of the system. This topology is an interesting hybrid of the previous DGX (put all GPUs onto a unified NVLink graph), and the more traditional PCIe accelerator cards (star topology of PCIe links, host CPU is the root node). Probably not an issue, I think PCIe 5.0 x16 is already fast enough to not bottleneck multi-GPU training too much.
binarymax|2 years ago
I have seen some benchmarks from academia but nothing in the private sector.
I wonder if they thought they were moving too fast and wanted to milk amphere/ada as long as possible.
Not having any competition whatsoever means Nvidia can release what they like when they like.
__anon-2023__|2 years ago
I got an email from vultr, saying that they're "officially taking reservations for the NVIDIA HGX H100", so I guess all public clouds are going to get those soon.
rerx|2 years ago
ksec|2 years ago
You can safely assume an entity bought as many as they could.
ecshafer|2 years ago
wyldfire|2 years ago
[1] https://www.qualcomm.com/products/technology/processors/clou...
[2] https://github.com/quic/software-kit-for-qualcomm-cloud-ai-1...
ethbr0|2 years ago
For anything that can be run remotely, it'll always be deployed and optimized server-side first. Higher utilization means more economy.
Then trickle down to local and end user devices if it makes sense.
wmf|2 years ago
Sol-|2 years ago
01100011|2 years ago
enlyth|2 years ago
It was a slap in the face when the 4090 had the same memory capacity as the 3090.
A6000 is 5000 dollars, ain't no hobbyist at home paying for that.
andrewstuart|2 years ago
If you are a business user then you must pay Nvidia gargantuan amounts of money.
This is the outcome of a market leader with no real competition - you pay much more for lower power than the consumer GPUs and you are forced into ujsing their business GPUs through software license restrictions on the drivers.
nullc|2 years ago
ATI seems to be holding the idiot ball.
Port stable diffusion and clip to their hardware. Train an upsized version sized for a 48GB card. Release a prosumer 48gb card... get huge uptake from artists and creators using the tech.
andrewstuart|2 years ago
Whether or not there is real competition depends entirely on whether Intels Arc line of GPUs stays in the market.
AMD strangely has decided not to compete. Its newest GPU the 7900 XTX is an extremely powerful card, close to the top of the line Nvidia RTX 4090 in raster performance.
If AMD had introduced it with an aggressively low price then then they could have wedged Nvidia, which is determinbed to exploit it's market dominance by squeezing the maximum money out of buyers.
Instead, AMD has decided to simply follow Nvidia in squeezing for maximum prices, with AM prices slightly behind Nvidia.
It's a strange decision from AMD who is well behind in market and apparently seems disinterested in increasing that market share by competing aggressively.
So a third player is needed - Intel - it's alot harder for three companies to sit on outrageously high prices for years rather than compete with each other for market share.
dragontamer|2 years ago
Since Intel GPUs are again TSMC manufactured, you really aren't going to see price improvements unless Intel subsidizes all of this.
enlyth|2 years ago
JonChesterfield|2 years ago
brucethemoose2|2 years ago
- The Intel Falcon Shores XPU is basically a big GPU that can use DDR5 DIMMS directly, hence it can fit absolutely enormous models into a single pool. But it has been delayed to 2025 :/
- AMD have not mentioned anything about the (not delayed) MI300 supporting DIMMs. If it doesn't, its capped to 128GB, and its being marketed as an HPC product like the MI200 anyway (which you basically cannot find on cloud services).
Nvidia also has some DDR5 grace CPUs, but the memory is embedded and I'm not sure how much of a GPU they have. Other startups (Tenstorrent, Cerebras, Graphcore and such) seemed to have underestimated the memory requirements of future models.
YetAnotherNick|2 years ago
That's the problem. Good DDR5 RAM's memory speed is <100GB/s, while nvidia could has up to 2TB/s, and still the bottleneck lies on memory speed for most applications.
virtuallynathan|2 years ago
int_19h|2 years ago
frankchn|2 years ago
pixl97|2 years ago
aliljet|2 years ago
rerx|2 years ago
bick_nyers|2 years ago
metadat|2 years ago
rerx|2 years ago
unknown|2 years ago
[deleted]
sargun|2 years ago
koheripbal|2 years ago
It's also enormously more expensive and I'm not sure if you can buy it new without getting the nvidia compute server.
0xbadc0de5|2 years ago
tromp|2 years ago
rerx|2 years ago
0xbadc0de5|2 years ago
ipsum2|2 years ago
YetAnotherNick|2 years ago
solarmist|2 years ago
eliben|2 years ago
jiggawatts|2 years ago
Previously, GPUs were designed for gamers, and no game really "needs" more than 16 GB of VRAM. I've seen reviews of the A100 and H100 cards saying that the 80GB is ample for even the most demanding usage.
Now? Suddenly GPUs with 1 TB of memory could be immediately used, at scale, by deep-pocket customers happy to throw their entire wallets at NVIDIA.
This new H100 NVL model is a Frankenstein's monster stitched together from whatever they had lying around. It's a desperate move to corner the market early as possible. It's just the beginning, a preview of the times to come.
There will be a new digital moat, a new capitalist's empire, built upon on the scarcity of cards "big enough" to run models that nobody but a handful of megacorps can afford to train.
In fact, it won't be enough to restrict access by making the models expensive to train. The real moat will be models too expensive to run. Users will have to sign up, get API keys, and stand in line.
"Safe use of AI" my ass. Safe profits, more like. Safe monopolies, safe from competition.
g42gregory|2 years ago
tpmx|2 years ago
Symmetry|2 years ago
garbagecoder|2 years ago