I've never heard the caveat that it can't be attributable to misinformation in the pre-training corpus. For frontier models, we don't even have access to the enormous training corpus, so we would have no way of verifying whether or not it is regurgitating some misinformation that it had seen there or whether it is inventing something out of whole cloth.
Aurornis|2 months ago
If the LLM is accurately reflecting the training corpus, it wouldn’t be considered a hallucination. The LLM is operating as designed.
Matters of access to the training corpus are a separate issue.
Workaccount2|2 months ago
I want to say it was some fact about cheese or something that was in fact wrong. However you could also see the source gemini cited in the ad, and when you went to that source, it was some local farm 1998 style HTML homepage, and on that page they had the incorrect factoid about the cheese.
CGMthrowaway|2 months ago
That would mean that there is never any hallucination.
The point of original comment was distinguishing between fact and fiction, which an LLM just cannot do. (It's an unsolved problem among humans, which spills into the training data)
parineum|2 months ago
eMPee584|2 months ago
also, statments with certainty about fictitious "honey pot prompts" are a problem, plausibly extrapolating from the data should be more governed by internal confidence.. luckily there are benchmarks now for that i believe