top | item 42436785

(no title)

NotSammyHagar | 1 year ago

I saw your recent post on running Llama 3.3 70B on a m2 pro 64 gb. Do the many variants of apple silicon alternatives with varying numbers of cpus, gpus, and neural engines matter that much for how fast these llms can generate tokens, answer questions? More hw is always better, but what can we say how performance scales with the many different choices?

64gb ram is crucial, after that, need 1+ tb storage, and then?

discuss

order

simonw|1 year ago

I don't know. I believe memory bandwidth matters, and I got the impression that the M4 series isn't yet as good as the M2 was on that front, but I'm half-remembering things I've heard here.