top | item 39763388

(no title)

_yo2u | 1 year ago

you mean the chip that only has 44GB of memory on one chip and doesn't mention how fast chips can talk between each other?

discuss

jpeggtulsa|1 year ago

Why cut up a a wafer of chips, package each with HBM, put the package on a board, connect to CPUs with a fabric, then tie them all back together with networking chips and cables? Their 3 new clusters are the top 3 biggest AI training platforms in the world. Comparing the WSE-3 against Nvidia's H100. "It's got 52 times more cores. It's got 800 times more memory on chip. It's got 7,000 times more memory bandwidth and more than 3,700 times more fabric bandwidth. But they don't sell the chips, they build the clusters and sell the compute power. Except for a couple they built in Dubai.

Plus, newer models are moving from Transformer to Mamba and don't need as much memory, because they save the important information, not everything it's been trained on.