top | item 42844036 (no title) keheliya | 1 year ago Running it in a MacBook Pro entirely locally is possible via Ollama. Even running the full model (680B) is possible distributed across multiple M2 ultras, apparently: https://x.com/awnihannun/status/1881412271236346233 discuss order hn newest vessenes|1 year ago That’s a 3 bit quant. I don’t think there’s a theoretical reason you couldnt run it fp16, but it would be more than two M2 Ultras. 10 or 11 maybe! bildung|1 year ago Well there's the practical reason of the model natively being fp8 ;) One of the innovative ideas making it so much less compute-intensive, apparently. rsanek|1 year ago the 70B distilled version that you can run locally is pretty underwhelming though
vessenes|1 year ago That’s a 3 bit quant. I don’t think there’s a theoretical reason you couldnt run it fp16, but it would be more than two M2 Ultras. 10 or 11 maybe! bildung|1 year ago Well there's the practical reason of the model natively being fp8 ;) One of the innovative ideas making it so much less compute-intensive, apparently.
bildung|1 year ago Well there's the practical reason of the model natively being fp8 ;) One of the innovative ideas making it so much less compute-intensive, apparently.
vessenes|1 year ago
bildung|1 year ago
rsanek|1 year ago