I saw your recent post on running Llama 3.3 70B on a m2 pro 64 gb. Do the many variants of apple silicon alternatives with varying numbers of cpus, gpus, and neural engines matter that much for how fast these llms can generate tokens, answer questions? More hw is always better, but what can we say how performance scales with the many different choices?64gb ram is crucial, after that, need 1+ tb storage, and then?
simonw|1 year ago
alberth|1 year ago