top | item 45429179

(no title)

ramshanker | 5 months ago

I am not able to guess, what is preventing Cerebras from replacing few of the cores in the Wafer-Scale package with HBM memory? It seems the only constraint with their WSE3 is memory capacity. Considering the size of NVDA chips, Only a small subset of wafer area should easily exceed the memory size of contemporary models.

discuss

order

reliabilityguy|5 months ago

DRAMs (core of the HBM memories) use different technology nodes than logic and SRAM. Also, stacking that many DRAMs on waver will complicate the packaging quite a bit I think.

xadhominemx|5 months ago

I don’t think so. The reason why Cerebras is so fast for inference is that the KV cache sits in the SRAM.

aurareturn|5 months ago

If you replace some cores with HBM on package, you basically get the traditional GPU + HBM model.