top | item 38926481

(no title)

teilo | 2 years ago

I'm running it on an M2 Max with 96GB, and have plenty of room to spare. And it's fast. Faster than I can get responses from ChatGPT.

discuss

coder543|2 years ago

How many tokens/s? Which quantization? If you could test Q4KM and Q3KM, it would be interesting to hear how the M2 Max does!

teilo|2 years ago

No quantization (8_0). The full 48GB model. As for token count, I haven't tested it on more than 200 or so.