top | item 37250840

(no title)

dangelov | 2 years ago

I've used Ollama to run Llama 2 (all variants) on my 2020 Intel MacBook Pro - it's incredibly easy. You just install the app and run a couple of shell commands. I'm guessing soon-ish this model will be available too and then you'd be able to use it with the Continue VS Code extension.

Edited to add: Though somewhat slow, swap seems to have been a good enough replacement for not having the loads of RAM required. Ollama says "32 GB to run the 13B models", but I'm running the llama2:13b model on a 16 GB MBP.

discuss

j45|2 years ago

Apple Silicon, especially an M1 Max Studio seems to be an interesting machine to hang on to as the models become more and more efficient with using less and less.

If there's nay other opinions or thoughts on this, I'd be very happy to learn as well. I have considered the eGPU route connected to a 1L PC such as a thinkcentre m80/90.

BoorishBears|2 years ago

I have a 64 GB M1 Max MBP, and I'd say unless you really have some academic interest towards messing with open models, for now accessing SOTA models via a REST API has better latency for a given quality.

Claude 1.2 instant is as fast as 3.5, follows instructions at a quality closer to 4, and has a 100k context window. Hard to compete with that with an open source model right now.