(no title)
mft_ | 14 days ago
This doesn't make sense.
1. Nvidia already sells e.g. the H100 with 80GB memory, so having 44GB isn't an advance, let alone a differentiator.
2. As I suspect anyone that's played with open weights models will attest, there's no way that 5.3-Codex-Spark is getting close to top-level performance and being sold in this way while being <44GB. Yes it's weaker and for sure it's probably a distil and smaller, but not by ~two orders of magnitude as suggested.
EdNutting|14 days ago
NVIDIA chips use HBM (High Bandwidth Memory) which is a form of DRAM - each bit is stored using a capacitor that has to be read and refreshed.
Most chips have caches on them built out of SRAM - a feedback loop of transistors that store each bit.
The big differences are in access time, power and density: SRAM is ~100 times faster than DRAM but DRAM uses much less power per gigabyte, and DRAM chips are much smaller per gigabyte of stored data.
Most processors have a few MB of SRAM as caches. Cerebras is kind of insane in that they’ve built one massive wafer-scale chip with a comparative ocean of SRAM (44GB).
In theory that gives them a big performance advantage over HBM-based chips.
As with any chip design though, it really isn’t that simple.
stingraycharles|14 days ago
mft_|14 days ago
aurareturn|14 days ago
The whole reason Cerebras can inference a model thousands of tokens per second is because it hosts the entire model in SRAM.
There are two possible scenarios for Codex Spark:
1. OpenAI designed a model to fit exactly 44GB.
2. OpenAI designed a model that require Cerebras to chain multiple wafer chips together; IE, an 88GB or 132GB or 176GB model or more.
Both options require the entire model to fit inside SRAM.
woadwarrior01|14 days ago