top | item 35172300 (no title) ddren | 3 years ago The python implementation[1] ran some tests using the same quantization algorithm as llama.cpp (4 bit RTN).1: https://github.com/qwopqwop200/GPTQ-for-LLaMa discuss order hn newest nshm|3 years ago Great thanks a lot.So we have numbers on PTB original perplexity 8.79 quantized 9.68, already 10% worse. And PPL reported per token I suppose? Because word PPL for PTB must be around 20, not less than 10.Any numbers on more complex tasks then? like QA?
nshm|3 years ago Great thanks a lot.So we have numbers on PTB original perplexity 8.79 quantized 9.68, already 10% worse. And PPL reported per token I suppose? Because word PPL for PTB must be around 20, not less than 10.Any numbers on more complex tasks then? like QA?
nshm|3 years ago
So we have numbers on PTB original perplexity 8.79 quantized 9.68, already 10% worse. And PPL reported per token I suppose? Because word PPL for PTB must be around 20, not less than 10.
Any numbers on more complex tasks then? like QA?