top | item 46723555

(no title)

j2kun | 1 month ago

I spot-checked one of the flagged papers (from Google, co-authored by a colleague of mine)

The paper was https://openreview.net/forum?id=0ZnXGzLcOg and the problem flagged was "Two authors are omitted and one (Kyle Richardson) is added. This paper was published at ICLR 2024." I.e., for one cited paper, the author list was off and the venue was wrong. And this citation was mentioned in the background section of the paper, and not fundamental to the validity of the paper. So the citation was not fabricated, but it was incorrectly attributed (perhaps via use of an AI autocomplete).

I think there are some egregious papers in their dataset, and this error does make me pause to wonder how much of the rest of the paper used AI assistance. That said, the "single error" papers in the dataset seem similar to the one I checked: relatively harmless and minor errors (which would be immediately caught by a DOI checker), and so I have to assume some of these were included in the dataset mainly to amplify the author's product pitch. It succeeded.

discuss

order

i_am_proteus|1 month ago

>this error does make me pause to wonder how much of the rest of the paper used AI assistance

And this is what's operative here. The error spotted, the entire class of error spotted, is easily checked/verified by a non-domain expert. These are the errors we can confirm readily, with obvious and unmistakable signature of hallucination.

If these are the only errors, we are not troubled. However: we do not know if these are the only errors, they are merely a signature that the paper was submitted without being thoroughly checked for hallucinations. They are a signature that some LLM was used to generate parts of the paper and the responsible authors used this LLM without care.

Checking the rest of the paper requires domain expertise, perhaps requires an attempt at reproducing the authors' results. That the rest of the paper is now in doubt, and that this problem is so widespread, threatens the validity of the fundamental activity these papers represent: research.

neilv|1 month ago

> If these are the only errors, we are not troubled. However: we do not know if these are the only errors, they are merely a signature that the paper was submitted without being thoroughly checked for hallucinations. They are a signature that some LLM was used to generate parts of the paper and the responsible authors used this LLM without care.

I am troubled by people using an LLM at all to write academic research papers.

It's a shoddy, irresponsible way to work. And also plagiarism, when you claim authorship of it.

I'd see a failure of the 'author' to catch hallucinations, to be more like a failure to hide evidence of misconduct.

If academic venues are saying that using an LLM to write your papers is OK ("so long as you look it over for hallucinations"?), then those academic venues deserve every bit of operational pain and damaged reputation that will result.

fn-mote|1 month ago

This seems like finding spelling errors and using them to cast the entire paper into doubt.

I am unconvinced that the particular error mentioned above is a hallucination, and even less convinced that it is a sign of some kind of rampant use of AI.

I hope to find better examples later in the comment section.

jvanderbot|1 month ago

The problem is, 10 years ago when I was still publishing even I would let an incorrect citation go through b/c of an old bibtex file or some such.

ls612|1 month ago

Google scholar and the vagaries of copy/paste errors has mangled bibitex ever since it became a thing, a single citation with these sorts of errors may not even be AI, just “normal” mistakes.

jasonfarnon|1 month ago

agree, I dont find this evidence of AI. It often happened that authors change, there are multiple venues, or I'm using an old version of the paper. We also need to see the denominator. If this google paper had this one bad citation out of 20 versus out of 60.

Also everyone I know has been relying on google scholar for 10+ years. Is that AI-ish? There are definitely errors on there. If you would extrapolate from citation issues to the content in the age of LLMs, were you doing so then as well?

It's the age-old debate about spelling/grammar issues in technical work. In my experience it rarely gets to the point that these errors eg from non-native speakers affect my interpretation. Others claim to infer shoddy content.

andy12_|1 month ago

> However: we do not know if these are the only errors, they are merely a signature that the paper was submitted without being thoroughly checked for hallucinations

Given how stupidly tedious and error-prone citations are, I have no trouble believing that the citation error could be the only major problem with the paper, and that it's not a sign of low quality by itself. It would be another matter entirely if we were talking about something actually important to the ideas presented in the paper, but it isn't.

_alternator_|1 month ago

The missing analysis is, of course, a comparison with pre-LLM conferences, like 2022 or 2023 that would show a “false positive” rate for the tool.

anishrverma|1 month ago

Agreed.

What I find more interesting is how easy these errors are to introduce and how unlikely they are to be caught. As you point out, a DOI checker would immediately flag this. But citation verification isn’t a first-class part of the submission or review workflow today.

We’re still treating citations as narrative text rather than verifiable objects. That implicit trust model worked when volumes were lower, but it doesn’t seem to scale anymore

There’s a project I’m working on at Duke University, where we are building a system that tries to address exactly this gap by making references and review labor explicit and machine verifiable at the infrastructure level. There’s a short explainer here that lays out what we mean, if useful context helps: https://liberata.info/

dexdal|1 month ago

Citation checks are a workflow problem, not a model problem. Treat every reference as a dependency that must resolve and be reproducible. If the checker cannot fetch and validate it, it does not ship.

nazgul17|1 month ago

The thing is, when you copy paste a bibliography entry from the publisher or from Google Scholar, the authors won't be wrong. In this case, it is. If I were to write a paper with AI, I would at least manage the bibliography by hand, conscious of hallucinations. The fact that the hallucination is in the bibliography is a pretty strong indicator that the paper was written entirely with AI.

jmmcd|1 month ago

Google Scholar provides imperfect citations - very often wrong article type (eg article versus conference paper), but up to and including missing authors, in my experience.

arjvik|1 month ago

I'm not sure I agree... while I don't ever see myself writing papers with AI, I hate wrangling a bibtex bibliography.

I wouldn't trust today's GPT-5-with-web-search to do turn a bullet point list of papers into proper citations without checking myself, but maybe I will trust GPT-X-plus-agent to do this.

nativeit|1 month ago

I see your point, but I don’t see where the author makes any claims about the specifics of the hallucinations, or their impact on the papers’ broader validity. Indeed, I would have found the removal of supposed “innocuous” examples to be far more deceptive than simply calling a spade a spade, and allowing the data to speak for itself.

j2kun|1 month ago

The author calls the mistakes "confirmed hallucinations" without proof (just more or less evidence). The data never "speak for itself." The author curates the data and crafts a story about it. This story presented here is very suggestive (even using the term "hallucination" is suggestive). But calling it "100 suspected hallucinations", or "25 very likely hallucinations" does less for the author's end goal: selling their service.

gowld|1 month ago

The point is that they should focus on the meaningful errors, not the automiation of meaningless errors.

m-schuetz|1 month ago

Bibtex are often also incorrectly generated. E.g., google scholar sometimes puts the names of the editors instead of the authors into the bibtex entry.

worik|1 month ago

> Bibtex are often also incorrectly generated

...and including the erroneous entry is squarely the author's fault.

Papers should be carefully crafted, not churned out.

I guess that makes me sweetly naive

nearbuy|1 month ago

The rate here (about 1% of papers) just doesn't seem that bad, especially if many of the errors are minor and don't affect the validity of the results. In other fields, over half of high-impact studies don't replicate.

davidguetta|1 month ago

Yeah even the entire "Jane Doe / Jame Smith" my first thought is that it could have been a latex default value

There was dumb stuff like this before the GPT era, it's far from convincing

ls612|1 month ago

There are people who just want to punish academics for the sake of punishing academics. Look at all the people downthread salivating over blacklisting or even criminally charging people who make errors like this with felony fraud. Its the perfect brew of anti AI and anti academia sentiment.

Also, in my field (economics), by far the biggest source of finding old papers invalid (or less valid, most papers state multiple results) is good old fashioned coding bugs. I'd like to see the software engineers on this site say with a straight face that writing bugs should lead to jail time.

nativeit|1 month ago

> Between 2020 and 2025, submissions to NeurIPS increased more than 220% from 9,467 to 21,575. In response, organizers have had to recruit ever greater numbers of reviewers, resulting in issues of oversight, expertise alignment, negligence, and even fraud.

I don’t think the point being made is “errors didn’t happen pre-GPT”, rather the tasks of detecting errors have become increasingly difficult because of the associated effects of GPT.

bjourne|1 month ago

Still a citation to a work you clearly have not read...

fmbb|1 month ago

> So the citation was not fabricated, but it was incorrectly attributed (perhaps via use of an AI autocomplete).

Well the title says ”hallucinations”, not ”fabrications”. What you describe sounds exactly like what AI builders call hallucinations.

j2kun|1 month ago

Read the article. The author uses the word "fabricate" repeatedly to describe the situation where the wrong authors are in the citation.

beowulfey|1 month ago

As with anything, it is about trusting your tools. Who is culpable for such errors? In the days of human authors, the person writing the text is responsible for not making these errors. When AI does the writing, the person whose name is on the paper should still be responsible—but do they know that? Do they realize the responsibility they are shouldering when they use these AI tools? I think many times they do not; we implicitly trust the outputs of these tools, and the dangers of that are not made clear.

lou1306|1 month ago

> relatively harmless and minor errors

They are not harmless. These hallucinated references are ingested by Google Scholar, Scopus, etc., and with enough time they will poison those wells. It is also plain academic malpractice, no matter how "minor" the reference is.

janalsncm|1 month ago

This is par for the course for GPTZero, which also falsely claims they can detect AI generated text, a fundamentally impossible task to do accurately.

ainch|1 month ago

I'm not going to bat for GPTZero, but I think it's clearly possible to identify some AI-written prose. Scroll through LinkedIn or Twitter replies and there are clear giveaways in tone, phrasing and repeated structures (it's not just X it's Y).

Not to say that you could ever feasibly detect all AI-generated text, but if it's possible for people to develop a sense for the tropes of LLM content then there's no reason you couldn't detect it algorithmically.

StopDisinfo910|1 month ago

The example you provided doesn't sit right me.

If the mistake is one error of author and location in a citation, I find it fairly disingenuous to call that an hallucination. At least, it doesn't meet the threshold for me.

I have seen this kind of mistakes done long before LLM were even a thing. We used to call them that: mistakes.

currymj|1 month ago

the earlier list of ICLR papers had way more egregious examples. Those were taken from the list of submissions not accepted papers however.

bjourne|1 month ago

Sorry, but blaming it on "AI autocomplete" is the dumbest excuse ever. Author lists come from BibTeX entries and while they often contains errors since they can come from many sources, they do not contain completely made up authors. I don't share your view that hallucinated citations are less damaging in background section. Background, related works, and introduction is the sections where citations most often show up. These sections are meant to be read and generating them with AI is plain cheating.

j2kun|1 month ago

I'm not blaming anything on anything, because I did not (nor did the authors) confirm the cause of any of these errors.

> I don't share your view that hallucinated citations are less damaging in background section.

Who exactly is damaged in this particular instance?

David_Osipov|1 month ago

Great job! I've tried to test their tool as well, but was totally paywalled.

NedF|1 month ago

[deleted]