Will the reranker trained with MSE be better calibrated than those trained with InfoNCE? Will threshold on reranker scores be more useful in RAG applications?
We found that MSE after elo-adjustment worked equally well. And, MSE lets you shuffle (q, d) across the dataset which has good statistical properties (Versus contrastive, which makes you sample the same query many times within a single minibatch)
In this case "InfoNCE" isn't applicable because the reranker's output is a scalar, not a vector. So that's why we checked both bradley-terry and MSE.
npip99|7 months ago
We found that MSE after elo-adjustment worked equally well. And, MSE lets you shuffle (q, d) across the dataset which has good statistical properties (Versus contrastive, which makes you sample the same query many times within a single minibatch)
In this case "InfoNCE" isn't applicable because the reranker's output is a scalar, not a vector. So that's why we checked both bradley-terry and MSE.