top | item 42844036

(no title)

keheliya | 1 year ago

Running it in a MacBook Pro entirely locally is possible via Ollama. Even running the full model (680B) is possible distributed across multiple M2 ultras, apparently: https://x.com/awnihannun/status/1881412271236346233

discuss

vessenes|1 year ago

That’s a 3 bit quant. I don’t think there’s a theoretical reason you couldnt run it fp16, but it would be more than two M2 Ultras. 10 or 11 maybe!

bildung|1 year ago

Well there's the practical reason of the model natively being fp8 ;) One of the innovative ideas making it so much less compute-intensive, apparently.

rsanek|1 year ago

the 70B distilled version that you can run locally is pretty underwhelming though