top | item 44895178

(no title)

ml_more | 6 months ago

We did a test of GPT5 yesterday. We asked it to generate a synopsis of a scientific topic and cite sources. We then checked those sources. GPT5 still hallucinated 65% of the citations. It did things like: Make up the paper title Make up the authors for a real paper title Mix a real title and a real journal If it can't even reference real papers it certainly can't be trusted to match up claims of fact with real sources.

Current AI tools generate citations that LOOK real but ARE fake. This might not be solvable inside the LLM. If anyone could do it, it'd be OpenAI. (OK maybe I'm giving them too much credit, but they have a crap-ton of money and seem to show a real interest in making their AI better)

If it can't be done in the LLM we can't trust LLMs basically ever. I suppose there's a pretty big loophole here. Doing it outside the LLM but INSIDE the LLM product would be good enough.

The first AI tool to incorporate that (internal citation and claim checking) will win because if the AI can check itself and prevent hallucinated garbage from ever reaching the user we can start to trust them and then they can do everything we've been promised. Until that day comes we can't trust them for anything.

discuss

beacon294|6 months ago

Google already did this, give free gemini deepresearch a spin. It's not perfect, but I have a feeling you'll be surprised if this is your honest impression.