Lots of other factors. I suspect this is one of the reasons why Google cannot offer TPU hardware itself out of their cloud service. A significant chunk of TPU efficiency can be attributed external factors which customers cannot easily replicate.
Google does many pieces of the data center better. Google TPUs use 3D torus networking and are liquid cooled.
> What even is an AI data center?
Being newer, AI installations have more variations/innovation than traditional data centers. Google's competitors have not yet adopted all of Google's advances.
> are the GPU/TPU boxes in a different building than the others?
Not that I've read. They are definitely bringing on new data centers, but I don't know if they are initially designed for pure-AI workloads.
Wouldn't a 3d torus network have horrible performance with 9,216 nodes? And really horrible latency? I'd have assumed traditional spine-leaf would do better. But I must be wrong as they're claiming their latency is great here. Of course, they provide zero actual evidence of that.
And I'll echo, what even is an AI data center, because we're still none the wiser.
summerlight|10 months ago
xnx|10 months ago
Google does many pieces of the data center better. Google TPUs use 3D torus networking and are liquid cooled.
> What even is an AI data center?
Being newer, AI installations have more variations/innovation than traditional data centers. Google's competitors have not yet adopted all of Google's advances.
> are the GPU/TPU boxes in a different building than the others?
Not that I've read. They are definitely bringing on new data centers, but I don't know if they are initially designed for pure-AI workloads.
nsteel|10 months ago
And I'll echo, what even is an AI data center, because we're still none the wiser.