top | item 38926481 (no title) teilo | 2 years ago I'm running it on an M2 Max with 96GB, and have plenty of room to spare. And it's fast. Faster than I can get responses from ChatGPT. discuss order hn newest coder543|2 years ago How many tokens/s? Which quantization? If you could test Q4KM and Q3KM, it would be interesting to hear how the M2 Max does! teilo|2 years ago No quantization (8_0). The full 48GB model. As for token count, I haven't tested it on more than 200 or so. load replies (1)
coder543|2 years ago How many tokens/s? Which quantization? If you could test Q4KM and Q3KM, it would be interesting to hear how the M2 Max does! teilo|2 years ago No quantization (8_0). The full 48GB model. As for token count, I haven't tested it on more than 200 or so. load replies (1)
teilo|2 years ago No quantization (8_0). The full 48GB model. As for token count, I haven't tested it on more than 200 or so. load replies (1)
coder543|2 years ago
teilo|2 years ago