top | item 42343886

(no title)

pulse7 | 1 year ago

Or wait for the IQ2_M quantization of 70b which you can run very fast on 24GB VRAM with context size of 4096...

discuss

griomnib|1 year ago

At some point there’s so much degradation with quantizing I think 8b is going to be better for many tasks.