top | item 36870809

(no title)

krychu | 2 years ago

Self-plug. Here’s a fork of the original llama 2 code adapted to run on the CPU or MPS (M1/M2 GPU) if available:

https://github.com/krychu/llama

It runs with the original weights, and gets you to ~4 tokens/sec on MacBook Pro M1 with the 7B model.

discuss

order

No comments yet.