top | item 39852853

(no title)

tzm | 1 year ago

Cerebras WSE-3 contains 4 trillion transistors and 8 exaflops per sec, 20 PB bandwidth. 62 times the cores of an H100.. 900,000. I wonder if the WSE-3 can compete on price / performance though. Interesting times!

discuss

order

jsheard|1 year ago

Is anyone actually using those WSEs in anger yet? They're on their third generation now, but as far as I can tell the discussion of each generation consists of "Cerebras announces new giant chip" and then radio silence until they announce the next giant chip.

rthnbgrredf|1 year ago

Problem is Software. You can put out a XYZ trillion monster chip that beats anything hardware wise, but it is going nowhere if you don't have the tooling and massive community (like Nvidia has) to actually do some real A.I. stuff.

IshKebab|1 year ago

Unlikely. They cost so much that nobody is going to do research on them - at best it's porting existing models. And they're so different to GPUs that the porting effort is going to be enormous.

They also suffer from the global optimisation problem for layout of calculations so compile time is going to be insane.

Their WSE technology is also already obsolete - Tesla's chip does it in a much more logical and cost effective way.

JonChesterfield|1 year ago

They sold some. Not strictly speaking the same as using any but there's a decent chance some code is running on the machines.

shrubble|1 year ago

The Cerebras-2 is at the Pittsburgh Supercomputing Center. Not sure if they ordered a 3.

monocasa|1 year ago

> 62 times the cores of an H100.. 900,000.

More than that arguably. CUDA cores are more like SIMD lanes than CPU cores like cerebras's usage of 'core'. Since they have 4 wide tensor ops on cerebras, there's arguably 3.6M CUDA equivalent cores.

AnimalMuppet|1 year ago

9 trillion flops per core? That's... mind-boggling. Is that real?

And, 9 trillion flops per core in 4.4 million transistors per core. That sounds a bit too good to be true.

winwang|1 year ago

How's the single core-to-core bandwidth?