top | item 39767003

(no title)

stephbu | 1 year ago

Ironically the power per cycle is decreasing - power and thermal dissipation are really the limits NVIDIA is exploring. It’s what the software does with those cycles that is leaping exponentially.

discuss

order

paulmd|1 year ago

The other bottleneck they are studiously exploring and minimizing is beachfront area and networking/interconnect bandwidth.

Nvidia went all-in on infiniband serDES while AMD chose pcie/CXL. But since Pcie signaling requirements are tighter, you need bigger stronger PHYs, which means you get less actual area per beachfront. The penalty is latency/power, but who cares when gpus are latency-hiding machines anyway?

https://www.semianalysis.com/p/cxl-is-dead-in-the-ai-era

https://www.semianalysis.com/nvidia-b100-b200-gb200-cogs-pri...

this in turn means that nvidia can implement more links or bigger links in their nvswitch networks, which means they can construct bigger systems and push the TCO down.

Two 7900X is still functionally a 7900X, but two 3090s is functionally a 48GB card. Nvidia has got the interconnect bandwidth to a point where it’s a significant enough fraction of the local bandwidth to be functionally one single gpu - this is the same argument as MI300X etc. Doesn’t matter whether the link is on-package or off-package, what matters is that it’s a significant fraction of the speed of your local memory or cache ports. Nvidia did that, with large numbers of gpus, not just a pair of chiplets.

Nvidia has been thinking about this one for a long time - nvswitch is on its third generation, and can switch literal terabytes of data per switch, times several switches. The Mellanox purchase too, but it goes back way longer.

And unlike AMD they actually have a driver that works and just trivially exposes these capabilities and gets out of the way. If you want to tinker and build the open alternative that’s fine, other people want to work.

This is shocking to many AMD fanboys but actually Jensen is a good engineer too, nvidia is mostly on top because they sell products that people want (to such a relentless degree they get furious if they don’t get faster every year etc) and cannot be trivially displaced by “just as good” Radeon drivers etc - just see the latest installment of the geohot saga. Nobody is trapped by nvidia, it is a golden cage - getting actual work done or just going and playing a game instead of spending hours playing with regedit hacks to disable dxnavi to fix DX11 shader compilation stutter is what you’re buying.

https://twitter.com/__tinygrad__/status/1770160392389771305

https://old.reddit.com/search/?q=Dxnavi+stutter+&include_ove...

Nvidia is on top because of relentlessly competent engineering and savant-level business direction, and as much as people scoff at the idea… that’s literally the reason you hate him lol. He is a Jobs-like visionary figure that can see what the tech can be and drive the engineering and business factors to align along the long-term to get him where he wants to go, while also providing the funding and profit in the short term.

https://m.youtube.com/watch?v=Xn1EsFe7snQ&t=1034

The only company with comparable parasocial negative attachment is apple and it’s for the exact same underlying reason . People are also systematically unable to understand that apple users are not “trapped” or in need of rescuing either. People buy apple because it does what they want it to really well, and they don’t care about installing Linux on their phones. And nerds resent that deeply. It’s not a coincidence there’s this axis of warfare around both Nvidia and the App Store with the EU etc. Nerds cannot abide someone choosing the “wrong” hardware. They are right and you will buy the same thing as them or they will get the EU to outlaw your product, or change the symbol licensing to prevent you running on Linux, etc. If you don't like the same filesystem as me, obviously that means I get to relicense some symbols that have been there for 20+ years and break your filesystem. Btrfs is better, the council has spoken.

It keeps happening for a reason, folks, lol. Nerds can’t tolerate others making different choices. And those users disproportionately self-select to “nerd” platforms like android and AMD.