I do wonder where that extra acuity you get from 1% more shows up in practice.
I hate how I have basically no way to intuitively tell that because of how much of a black box the system is
Well why would Claude know any of this? Obviously it's the wrong criteria. If you have your own dataset to benchmark, created your own calibration for quantization with it. Scientifically, you wouldn't really believe in the whole process of gradient descent if you didn't think tiny differences in these values matter. So...
I think you might be answering to a different person or misunderstanding what I said but you are right that just as I don’t have an intuition for where the acuity shows up in the corpus, I don’t think Claude does either
doctorpangloss|1 day ago
tymscar|1 day ago