top | item 45670377

(no title)

nopinsight | 4 months ago

Hallucination Leaderboard "This evaluates how often an LLM introduces hallucinations when summarizing a document."

https://github.com/vectara/hallucination-leaderboard

If the figures on this leaderboard are to be trusted, many frontier and near-frontier models are already better than the median white-collar worker in this aspect.

Note: The leaderboard doesn't cover tool calling, to be clear.

discuss

order

whatever1|4 months ago

I’ve been reviewing academic papers for decades, and I’ve reviewed thousands of them. I’ve never seen a fake citation. I’ve seen misrepresented sources and cooked data, but never a straight-up fake citation.

So the min max and median are at 0.

nopinsight|4 months ago

Agreed that current LLMs have low floors despite decently high ceilings.

Note that people who write academic papers are quite far from the median white-collar worker.