top | item 38926263 (no title) reexpressionist | 2 years ago [dead] discuss order hn newest teilo|2 years ago I'm running it on an M2 Max with 96GB, and have plenty of room to spare. And it's fast. Faster than I can get responses from ChatGPT. coder543|2 years ago How many tokens/s? Which quantization? If you could test Q4KM and Q3KM, it would be interesting to hear how the M2 Max does! load replies (1)
teilo|2 years ago I'm running it on an M2 Max with 96GB, and have plenty of room to spare. And it's fast. Faster than I can get responses from ChatGPT. coder543|2 years ago How many tokens/s? Which quantization? If you could test Q4KM and Q3KM, it would be interesting to hear how the M2 Max does! load replies (1)
coder543|2 years ago How many tokens/s? Which quantization? If you could test Q4KM and Q3KM, it would be interesting to hear how the M2 Max does! load replies (1)
teilo|2 years ago
coder543|2 years ago