top | item 47180385

LLMs can run on tiny computers offering the same power and more than cloud LLMs

1 points| Haeuserschlucht | 2 days ago

I propose a theory. My theory is that LLMs do not require a lot of computing power. What they do require is a small computer, but it's blown out of proportion with unnecessary information to artificially increase demand for high power hardware. So the reality why models are the way they are, especially local models, is because they actually suck at optimizing on purpose. As I said, it's even added to complexity to increase sales of hardware.

What are your pros and what are your cons? And do not refer to secondhand information as in "well I was told" or "well I read a paper".

1 comment

tim-tday|2 days ago

How specifically do you propose this? Have you tried running inference on a local machine? (You should try that) I run local models on desktop machines, laptops, dedicated servers, cloud servers.

I’ve built my own LLM containers, ive built orchestration systems for fine tuning and model management. I’ve tried quantized models. I’ve tested a dozen or so models of different sizes.

You can’t really get around the fact that inference on cpu is slow, inference on gpu is gated by nvram (you need about 1gb of nvram per billion parameters, quantization reduces quality and increases operational toil). If you know of a consumer level gpu with 80-128gb of nvram that I can buy for less than $10k do please let me know.

Short of a specific proposal I’m going to classify your suggestion as not knowing enough about what you’re talking about for your proposal to make any sense.