top | item 47022985

(no title)

mft_ | 14 days ago

> So how much internal memory does the latest Cerebras chip have? 44GB. This puts OpenAI in kind of an awkward position. 44GB is enough to fit a small model (~20B params at fp16, ~40B params at int8 quantization), but clearly not enough to fit GPT-5.3-Codex. That’s why they’re offering a brand new model, and why the Spark model has a bit of “small model smell” to it: it’s a smaller distil of the much larger GPT-5.3-Codex model.

This doesn't make sense.

1. Nvidia already sells e.g. the H100 with 80GB memory, so having 44GB isn't an advance, let alone a differentiator.

2. As I suspect anyone that's played with open weights models will attest, there's no way that 5.3-Codex-Spark is getting close to top-level performance and being sold in this way while being <44GB. Yes it's weaker and for sure it's probably a distil and smaller, but not by ~two orders of magnitude as suggested.

discuss

EdNutting|14 days ago

You’re mixing up HBM and SRAM - which is an understandable confusion.

NVIDIA chips use HBM (High Bandwidth Memory) which is a form of DRAM - each bit is stored using a capacitor that has to be read and refreshed.

Most chips have caches on them built out of SRAM - a feedback loop of transistors that store each bit.

The big differences are in access time, power and density: SRAM is ~100 times faster than DRAM but DRAM uses much less power per gigabyte, and DRAM chips are much smaller per gigabyte of stored data.

Most processors have a few MB of SRAM as caches. Cerebras is kind of insane in that they’ve built one massive wafer-scale chip with a comparative ocean of SRAM (44GB).

In theory that gives them a big performance advantage over HBM-based chips.

As with any chip design though, it really isn’t that simple.

stingraycharles|14 days ago

So what you’re saying is that Cerebras chips offer 44GB of what is comparable to L1 caches, while NVidia is offering 80GB of what is comparable to “fast DRAM” ?

mft_|14 days ago

Thanks, TIL.

aurareturn|14 days ago

It does make sense. Nvidia chips do not promise 1,000+ tokens/s. The 80GB is external HBM, unlike Cerebras’ 44GB internal SRAM.

The whole reason Cerebras can inference a model thousands of tokens per second is because it hosts the entire model in SRAM.

There are two possible scenarios for Codex Spark:

1. OpenAI designed a model to fit exactly 44GB.

2. OpenAI designed a model that require Cerebras to chain multiple wafer chips together; IE, an 88GB or 132GB or 176GB model or more.

Both options require the entire model to fit inside SRAM.

woadwarrior01|14 days ago

Let's not forget the KV-cache which needs a lot of RAM too (although not as much as the model weights), and scales up linearly with sequence length.