top | item 44285396

(no title)

fentonc | 8 months ago

I think a quad-CPU X-MP is probably the first computer that could have run (not train!) a reasonably impressive LLM if you could magically transport one back in time. It supported a 4GB (512 MWord) SRAM-based "Solid State Drive" with a supported transfer bandwidth of 2 GB/s, and about 800 MFLOPS CPU performance on something like a big matmul. You could probably run a 7B parameter model with 4-bit quantization on it with careful programming, and get a token every couple seconds.

discuss

rahen|8 months ago

This sounds plausible and fascinating. Let’s see what it would have taken to train a model as well.

Given an estimate of 6 FLOPs per token per parameter, training a 7B parameter model would require about 1.26×10^22 FLOPs. That translates to roughly 500 000 years on an 800 MFLOPS X-MP, far too long to be feasible. Training a 100M parameter model would still take nearly 70 years.

However, a 7M-parameter model would only have required about six months of training, and a 14M one about a year, so let’s settle on 10 million. That’s already far more reasonable than the 300K model I mentioned earlier.

Moreover, a 10M parameter model would have been far from useless. It could have performed decent summarization, categorization, basic code autocompletion, and even powered a simple chatbot with a short context, all that in 1984, which would have been pure sci-fi back in those days. And pretty snappy too, maybe around 10 tokens per second if not a little more.

Too bad we lacked the datasets and the concepts...