top | item 40068667

(no title)

coolvision | 1 year ago

how does it compare to 8-bit/4-bit quantization in terms of speed/accuracy?

discuss

order

kolinko|1 year ago

hard to say for now, I’m curious as well, but I used simpler tests so far because of the implementation issues - most test suites are geared towards testing models and not model implementation.

I didn’t want to wait any longer with the release, but better tests will be coming soon I hope. Anecdotally, I think 30% effort should be comparable to Q8z

More importantly, this algorithm should work on top of Q8. The quality is not yet certain though - I could use help with the implementation.