(no title)
john61 | 3 months ago
OP says it is because that predicting the next token can be correct or not, but it always looks plausible because that is what it calculates. Therefore it is dangerous and can not be fixed because it is how it works in essence.
dangus|3 months ago
Literally yesterday ChatGPT hallucinated an entire feature of a mod for a video game I am playing including making up a fake console command.
It just straight up doesn’t exist, it just seemed like a relatively plausible thing to exist.
This is still happening. It never stopped happening. I don’t even see a real slowdown in how often it happens.
It sometimes feels like the only thing saving LLMs are when they’re forced to tap into a better system like running a search engine query.
WhyOhWhyQ|3 months ago
My hit/miss rate with using these models for academic questions is low, but non-trivial. I've definitely learned new math because of using them, but it's really just an indulgence because they make stuff up so frequently.
jl6|3 months ago
Maybe I should have asked it to write a patch that implements that feature.
hsuduebc2|3 months ago
cess11|3 months ago
The response to your query might not be what you needed, similar to interacting with an RDBMS and mistyping a table name and getting data from another table or misremembering which tables exist and getting an error. We would not call such faults "hallucinations", and shouldn't when the database is a pile of eldritch vectors either. If we persist in doing so we'll teach other people to develop dangerous and absurd expectations.
bgwalter|3 months ago
This is actually very profound. All free models are only reasonable if they scrape 100 web pages (according to their own output) before answering. Even then they usually have multiple errors in their output.
ajuc|3 months ago
It hallucinates whole lives out of nothing but stereotypes.
bitwize|3 months ago
[deleted]
anthonypasq|3 months ago
Lerc|3 months ago
The model expected a feature to exist because it fitted with the overall structure of the interface.
This in itself can be a valuable form of feedback. I currently don't know of any people doing it, but testing interfaces by getting LLMs to use them could be an excellent resource. Th the AI runs into trouble, it might be worth checking your designs to see if you have any inconsistencies, redundancies or other confusion causing issues.
One would assume that a consistent user interface would be easier for both AI and humas. Fixing the issues would improve it for both.
That failure could be leveraged into an automated process that identified areas to improve.