(no title)
KVFinn | 3 years ago
It looks like this is regular 4-bit and not GPTQ 4-bit? It's possible there's quality loss but we'll have to test.
>4-bit quantization tends to come at a cost of substantial output quality losses. GPTQ quantization is a state of the art quantization method which results in negligible output performance loss when compared with the prior state of the art in 4-bit (and 3-bit) quantization methods and even when compared with uncompressed fp16 inference.
No comments yet.