(no title)
jackcosgrove | 1 day ago
I then discovered what quantization is by reading a blog post about binary quantization. That seemed too good to be true. I asked Claude to design an analysis assessing the fidelity of 1, 2, 4, and 8 bit quantization. Claude did a good job, downloading 10,000 embeddings from a public source and computing a similarity score and correlation coefficient for each level of quantization against the float32 SoT. 1 and 2 bit quantizations were about 90% similar and 8 bit quantization was lossless given the precision Claude used to display the results. 4 bit was interesting as it was 99% similar (almost lossless) yet half the size of 8 bit. It seemed like the sweet spot.
This analysis took me all of an hour so I thought, "That's cool but is it real?" It's gratifying to see that 4 bit quantization is actually being used by professionals in this field.
deepsquirrelnet|1 day ago
It doesn't seem terribly common yet though. I think it is challenging to keep it stable.
[1] https://www.opencompute.org/blog/amd-arm-intel-meta-microsof...
[2] https://www.opencompute.org/documents/ocp-microscaling-forma...
zozbot234|11 hours ago
regularfry|15 hours ago
silisili|1 day ago
jackcosgrove|1 day ago
https://modernaicourse.org/
tymscar|1 day ago
I do wonder where that extra acuity you get from 1% more shows up in practice. I hate how I have basically no way to intuitively tell that because of how much of a black box the system is
doctorpangloss|1 day ago