(no title)
m4r1k | 3 months ago
While the B200 wins on raw FP8 throughput (~9000 vs 4614 TFLOPs), that makes sense given NVIDIA has optimized for the single-chip game for over 20 years. But the bottleneck here isn't the chip—it's the domain size.
NVIDIA's top-tier NVL72 tops out at an NVLink domain of 72 Blackwell GPUs. Meanwhile, Google is connecting 9216 chips at 9.6Tbps to deliver nearly 43 ExaFlops. NVIDIA has the ecosystem (CUDA, community, etc.), but until they can match that interconnect scale, they simply don't compete in this weight class.
cwzwarich|3 months ago
smilekzs|3 months ago
Same logic when NVidia quote the "bidirectional bandwidth" of high speed interconnects to make the numbers look big, instead of the more common BW per direction, forcing everyone else to adopt the same metric in marketing materials.
7e|3 months ago
markhahn|3 months ago
oivey|3 months ago
overfeed|3 months ago
No surprises there, Google is not the greatest company at productizing their tech for external consumption.
> The other players are certainly more than just competing with Google.
TBF, its easy to stay in the game when you're flush with cash, and for the past N-quarters, investors have been throwing money at AI companies, Nvidia's margins have greatly benefited from this largesse. There will be blood on the floor once investors start demanding returns to their investments.
PunchyHamster|3 months ago
Ecosystem is MASSIVE factor and will be a massive factor for all but the biggest models
epolanski|3 months ago
Also I feel you completely misunderstand that the problem isn't how fast is ONE gpu vs ONE tpu, what matters is the costs for the same output. If I can fill a datacenter at half the cost for the same output, does it matters I've used twice the TPUs and that a single Nvidia Blackwell was faster? No...
And hardware cost isn't even the biggest problem, operational costs, mostly power and cooling are another huge one.
So if you design a solution that fits your stack (designed for it) and optimize for your operational costs you're light years ahead of your competition using the more powerful solution, that costs 5 times more in hardware and twice in operational costs.
All I say is more or less true for inference economics, have no clue about training.