(no title)
tylerneylon | 1 year ago
This recent work is highly relevant: https://learnandburn.ai/p/how-to-tell-if-an-llm-is-just-gues...
It uses an idea called semantic entropy which is more sophisticated than the standard entropy of the token logits, and is more appropriate as a statistical quantification of when an LLM is guessing or has high certainty. The original paper is in Nature, by authors from Oxford.
vark90|1 year ago
But even with this in mind, there are caveats. We have recently published [2] a comprehensive benchmark of SOTA approaches to estimating uncertainty of LLMs, and have reported that while in many cases these semantic-aware methods do perform very well, in other tasks simple baselines, like average entropy of token distributions, performs on par or better than complex techniques.
We have also developed an open-source python library [3] (which is still in early development) that offers implementations of all modern UE techniques applicable to LLMs, and allows easy benchmarking of uncertainty estimation methods as well as estimating output uncertainty for deployed models in production.
[1] https://arxiv.org/abs/2307.01379
[2] https://arxiv.org/abs/2406.15627
[3] https://github.com/IINemo/lm-polygraph
mikkom|1 year ago
https://x.com/_xjdr
I have been following this quite closely, it has been very interesting as it seems smaller models can be more efficient with this sampler. Worth going through the posts if someone is interested in this. I kind of have a feeling that this kind of sampling is a big deal.
weitendorf|1 year ago
I don't say that to be a hater or discourage them because they may well be on to something, and it's good for unique approaches like this to be tried. But I'm also not surprised there aren't academic papers about this approach because if it had no positive effects for the reasons I mention, it probably wouldn't get published.
trq_|1 year ago
tylerneylon|1 year ago