(no title)
lowyek
|
1 year ago
I find it fascinating that while in other fields you see lot of theorums/results much before practical results are found. But in this forefront of innovation - I have hardly seen any paper discussing hallucinations and lowerbound/upperbound on that. Or may be I didn't open hacker news on that right day when it was published. Would love to understand the hallucination phenomena more deeply and the mathematics behind it.
hbn|1 year ago
There isn't really such thing as a "hallucination" and honestly I think people should be using the word less. Whether an LLM tells you the sky is blue or the sky is purple, it's not doing anything different. It's just spitting out a sequence of characters it was trained be hopefully what a user wants. There is no definable failure state you can call a "hallucination," it's operating as correctly as any other output. But sometimes we can tell either immediately or through fact checking it spat out a string of text that claims something incorrect.
If you start asking an LLM for political takes, you'll get very different answers from humans about which ones are "hallucinations"
raincole|1 year ago
People say it's "anthropomorphizing" but honestly I can't see it. The I in AI stands for intelligence, is this anthropomorphizing? L in ML? Reading and writing are clearly human activities, so is using read/write instead of input/output anthropomorphizing? How about "computer", a word once meant a human who does computing? Is there a word we can use safely without anthropomorphizing?
[1]: And please don't argue what's "wrong".
hatthew|1 year ago
Hallucination might not be the best word, but I don't think it's a bad word. If a weather model predicted a storm when there isn't a cloud in the sky, I wouldn't have a problem with saying "the weather model had a hallucination." 50 years ago, weather models made incorrect predictions quite frequently. That's not because they weren't modeling correct weather, it's because we simply didn't yet have good models and clean data.
Fundamentally, we could fix most LLM hallucinations with better model implementations and cleaner data. In the future we will probably be able to model factuality outside of the context of human language, and that will probably be the ultimate solution for correctness in AI, but I don't think that's a fundamental requirement.
shrimp_emoji|1 year ago
Humans also confabulate but not as a result of "hallucinations". They usually do it because that's actually what brains like to do, whether it's making up stories about how the world was created or, more infamously, in the case of neural disorders where the machinery's penchant for it becomes totally unmoderated and a person just spits out false information that they themselves can't realize is false. https://en.m.wikipedia.org/wiki/Confabulation
nl|1 year ago
This is a very "closed world" view of the phenomenon which looks at an LLM as a software component on its own.
But "hallucination" is a user experience problem, and it describes the experience very well. If you are using a code assistant and it suggests using APIs that don't exist then the word "hallucination" is entirely appropriate.
A vaguely similar analogy is the addition of the `let` and `const` keywords in JS ES6. While the behavior of `var` was "correct" as-per spec the user experience was horrible: bug prone and confusing.
IanCal|1 year ago
We won't, and we'll see this constant distraction.
sandworm101|1 year ago
mortenjorck|1 year ago
There's definitely room for a better label, though. "Empirical mismatch" doesn't quite have the same ring as "hallucination," but it's probably a more accurate place to start from.
emporas|1 year ago
Is is possible for a chess engine to compute the next move and be absolutely sure it is the best one? It's not, it is a statistical approximation, but still very useful.
unknown|1 year ago
[deleted]
sqeaky|1 year ago
wincy|1 year ago
What am I suppose to call that?
beernet|1 year ago
That being said, people new to the field tend to believe that these models are fact machines. In fact, they are the complete opposite.
unknown|1 year ago
[deleted]
dennisy|1 year ago
No real way to mathematically prove this, considering there is also no way to know if the training data also had this “hallucination” inside of it.
ben_w|1 year ago
Investigate it with the tools of psychologically, as suited for use on a new non-human creature we've never encountered before.
cainxinth|1 year ago
https://www.nytimes.com/2023/11/06/technology/chatbots-hallu...
eskibars|1 year ago
For us, we treat hallucinations as the ability to accurately respond in an "open book" format for retrieval augmented generation (RAG) applications specifically. That is, given a set of information retrieved (X), does the LLM-produced summary:
1. Include any "real" information not contained in X? If "yes," it's a hallucination, even if that information is general knowledge. We see this as an important way to classify hallucinations in a RAG+summary context because enterprises have told us they don't want the LLMs "reading between the lines" to infer things. To pick an absurd/extreme case to show a point, the case of a genetic research firm, say, using CRISPR and finding they can create a purple zebra, if the retrieval system in the RAG bits says "zebras can be purple" due to their latest research, we don't want the LLM to override that knowledge with its knowledge that zebras are only ever black/white/brown. We'd treat that as a hallucination.
2. On the extreme opposite end, an easy way to avoid hallucinating would be for the LLM to say "I don't know" for everything thereby avoiding hallucinating by avoiding answering all questions. That has other obvious negative effects, so we also evaluate LLMs for their ability to answer.
We look at the factual consistency, answer rate, summary length, and some other metrics internally to focus prompt engineering, model selection, and model training: https://github.com/vectara/hallucination-leaderboard
lowyek|1 year ago
unknown|1 year ago
[deleted]
amelius|1 year ago
unknown|1 year ago
[deleted]