> stretched our ML supercomputer scale .. to 4096 TPU v4 nodes
> The Google tradition is to write retrospective papers ... TPU v4s and A100s deployed in 2020 and both use 7nm technology
> The appropriate H100 match would be a successor to TPU v4 deployed in a similar time frame and technology (e.g., in 2023 and 4 nm).
> TPU v4 supercomputers [are] the workhorses of large language models
(LLMs) like LaMDA, MUM, and PaLM]. These features allowed the 540B parameter PaLM model to sustain a remarkable 57.8% of the peak hardware floating point performance over 50 days while training on TPU v4 supercomputers
> Google has deployed dozens of TPU v4 supercomputers for both internal use and for external use via Google Cloud
> Moreover, the large size of the TPU v4 supercomputer and its reliance on OCSes looks prescient given that the design began two years before the paper was published that has stoked the enthusiasm for LLMs
Brought to you by the future: "Optical circuit switches (OCSes) dynamically reconfigure its interconnect topology to improve scale, availability, utilization, modularity, deployment, security, power, and performance; users can pick a twisted 3D torus topology if desired."
As the paper explains, optical circuit switches are not new in TPU v4 and not the main topic of this paper. Google was already using it for networking and published about it last year. For details, see https://arxiv.org/abs/2208.10041.
The future of the 90's. Optical matrix switches like this have been around for a long time. These aren't doing packet switching (which honestly would be the future if done optically), it's more of a layer 1 thing - the switch replaces you plugging and unplugging a cable. Bell labs was building these kinds of switches back in the day.
Is there any way to purchase anything like a TPU? I guess the Cerebras Andromeda product is one, but I don't know if those are sold or leased. Any others?
From the page : "Andromeda, a 13.5 Million Core AI Supercomputer". Blown away by the number of cores (I considered myself lucky to have 2 10000+ cores GPU in my workstation) I then realized that the word "core" is singular in the sentence. Is it just a mistake or does it mean something else ? (genuine question, English is not my first language)
EDIT: Ahhh a bit below on the page it is written "13.5 million AI-optimized cores" and there it's plural. So it was probably just a mistake.
I hate the enormous waste of human ability, ingenuity and effort in the creation of proprietary technologies like this. You've made a chip? Offer it for everyone to use. Same goes for Amazon and Apple. It's not as though it's a chip that's only usable for Google-specific work.
It’s because Google is a monopoly it doesn’t operate on normal economic incentives.
It’s goal is only to keep the monopoly: appear bening, keep tech advance in house, share it’s spying network with government so the government don’t regulate them, win-win.
If people are not allowed to monetize their innovations, there is no incentive to innovate. While this needs to have its limits, sharing it to everyone immediately upon creation is not an answer.
[+] [-] nighthawk454|3 years ago|reply
> stretched our ML supercomputer scale .. to 4096 TPU v4 nodes
> The Google tradition is to write retrospective papers ... TPU v4s and A100s deployed in 2020 and both use 7nm technology
> The appropriate H100 match would be a successor to TPU v4 deployed in a similar time frame and technology (e.g., in 2023 and 4 nm).
> TPU v4 supercomputers [are] the workhorses of large language models (LLMs) like LaMDA, MUM, and PaLM]. These features allowed the 540B parameter PaLM model to sustain a remarkable 57.8% of the peak hardware floating point performance over 50 days while training on TPU v4 supercomputers
> Google has deployed dozens of TPU v4 supercomputers for both internal use and for external use via Google Cloud
> Moreover, the large size of the TPU v4 supercomputer and its reliance on OCSes looks prescient given that the design began two years before the paper was published that has stoked the enthusiasm for LLMs
[+] [-] ttul|3 years ago|reply
[+] [-] sanxiyn|3 years ago|reply
[+] [-] pclmulqdq|3 years ago|reply
[+] [-] abcdabcd987|3 years ago|reply
[1] Jupiter Evolving: Transforming Google’s Datacenter Network via Optical Circuit Switches and Software-Defined Networking. https://research.google/pubs/pub51587/
[+] [-] rpcope1|3 years ago|reply
[+] [-] CGamesPlay|3 years ago|reply
https://www.cerebras.net/andromeda/
[+] [-] amrb|3 years ago|reply
https://tenstorrent.com/
[+] [-] dna_polymerase|3 years ago|reply
[+] [-] wiz21c|3 years ago|reply
EDIT: Ahhh a bit below on the page it is written "13.5 million AI-optimized cores" and there it's plural. So it was probably just a mistake.
[+] [-] svantana|3 years ago|reply
https://coral.ai/products/
[+] [-] einpoklum|3 years ago|reply
[+] [-] joiojoio|3 years ago|reply
So you don't have to be frustrated anymore.
[+] [-] alfor|3 years ago|reply
It’s goal is only to keep the monopoly: appear bening, keep tech advance in house, share it’s spying network with government so the government don’t regulate them, win-win.
[+] [-] omeysalvi|3 years ago|reply
[+] [-] unknown|3 years ago|reply
[deleted]