(no title)
formalsystem | 1 year ago
So this is gonna be 8 bit weights, 8 bit activations, group size of 256, symmetric quantization. Not sure how to map this to the GGUF variants because they don't mention how they don't do activation quantization
formalsystem | 1 year ago
So this is gonna be 8 bit weights, 8 bit activations, group size of 256, symmetric quantization. Not sure how to map this to the GGUF variants because they don't mention how they don't do activation quantization
imjonse|1 year ago
formalsystem|1 year ago
So for example for AWQ and GPTQ we can accelerate them by using a fast int4 kernel called tinygemm