top | item 36219214

(no title)

The fact that this is commodity hardware makes ggml extremely impressive and puts the tech in the hands of everyone. I recently reported my experience running 7B llama.cpp on a 15 year old Core 2 Quad [1] - when that machine came out it was a completely different world and I certainly never imagined how AI would look like today. This was around when the first iPhone was released and everyone began talking about how smartphones would become the next big thing. We saw what happened 15 years later...

Today with the new k-quants users are reporting that 30B models are working with 2-bit quantization on 16GB CPUs and GPUs [2]. That's enabling access to millions of consumers and the optimizations will only improve from there.

[1] https://old.reddit.com/r/LocalLLaMA/comments/13q6hu8/7b_perf...

[2] https://github.com/ggerganov/llama.cpp/pull/1684, https://old.reddit.com/r/LocalLLaMA/comments/141bdll/moneros...

discuss

No comments yet.