AI tidies up Wikipedia's references and boosts reliability

[+] Aurornis|2 years ago|reply

When Sci-Hub made a lot of papers available to the public, I started clicking through to more references on Wikipedia.

My goal was to learn more and go deeper on subjects, but I was stunned by how often the linked citation didn't support the claim in the Wikipedia article. There were many times where the linked citation said the opposite of the Wikipedia article.

My theory was that overly competitive Wikipedia authors were skimming PubMed abstracts and assuming the paper would support their assertion. Ironically, some of the statements with 5 or more citations were the most incorrect.

Trying to correct these articles is some times like going to war with editors who refuse to admit they were wrong.

[+] ravenstine|2 years ago|reply

Not only do some papers not support the claim at all (I found this to often be the case when I used to read the news regularly), but there is often a critical detail left out of the abstract that is key to how the study should be interpreted. The vast majority of people citing papers, by my measure, are relying solely on the abstract even when the entire paper is available for free.

This is often the case outside of Wikipedia as well. Truthfully, I'm a layman because I don't have an advanced degree that suggests I'm qualified to interpret papers, but things are bad enough IMO that part of me wishes people would stop trying to communicate science to the public until this broken system heals at least somewhat. On a related note, it doesn't help that folks like Andrew Huberman are normalizing this idea that one or two studies are good enough to form conclusions about how average people can "optimize" (micromanage) their lives in ways that are clinically relevant. This isn't to take away from the good that Huberman does, but I think it send a the wrong message to people who don't actually have any experience reading papers.

One thing that might help is forbidding abstracts and conclusion sections in papers published in journals, which might cut down on some of the misunderstandings and make it harder to pass off a paper as support for a claim.

> Trying to correct these articles is some times like going to war with editors who refuse to admit they were wrong.

There is virtually no point in trying to fix Wikipedia pages until Wikipedia fixes themselves. They have an extremely hard job, but their prolific editors are too bully-like and contradict themselves all the time.

[+] gwern|2 years ago|reply

> My goal was to learn more and go deeper on subjects, but I was stunned by how often the linked citation didn't support the claim in the Wikipedia article. There were many times where the linked citation said the opposite of the Wikipedia article.

WP is, unfortunately, not obviously any worse than researchers in general are: https://gwern.net/leprechaun#miscitation

> Trying to correct these articles is some times like going to war with editors who refuse to admit they were wrong.

Yep. Because now you are doing 'OR' by interpreting the paper, especially when the abstract is just lying/spin.

[+] murphyslab|2 years ago|reply

There's an incentive for scholarly publishers and authors to have their works cited on Wikipedia, since there is a belief (supported by some research [0]) that citation on Wikipedia increases subsequent scholarly citations to those articles. Such a benefit could be used to increase a Journal Impact Factor & similar metrics [1,2] or the citations to one's own works, which is beneficial to academics competing for promotion, tenure, and research grants which are partially based on citation metrics.[3] I wouldn't be surprised if there was some indiscriminate citation on Wikipedia to bolster the metrics.

[0] https://scholarlykitchen.sspnet.org/2022/11/01/guest-post-wi...

[1] https://en.wikipedia.org/wiki/Impact_factor

[2] https://en.wikipedia.org/wiki/Citation_impact

[3] https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6668985/

[+] btrettel|2 years ago|reply

Unfortunately, many people treat citations as a formality rather than something to help someone judge credibility. I try to cite as specifically as possible. Cite the page if possible (and which column if available), section if possible, and if you want to go the extra mile, the paragraph or line number if possible. That makes checking easy. This was common when I worked at the USPTO as a patent examiner, but wasn't standard practice when I was in academia.

Previous comment of mine on this issue (including a bit about how to make URLs more specific): https://news.ycombinator.com/item?id=23897686

[+] poszlem|2 years ago|reply

Over time, I've come to see that Wikipedia mirrors the broader internet: initially heralded as a game-changer, but when it really counts, it can be just as unreliable as the things that preceded it, if not worse.

I keep thinking of that quote from the Office: "Wikipedia is the best thing ever. Anyone in the world can write anything they want about any subject, so you know you are getting the best possible information."

[+] sheepshear|2 years ago|reply

I had a math professor who would occasionally correct an equation in the Wikipedia article about some theorem he invented, but someone would always change it back. He eventually gave up.

[+] nolist_policy|2 years ago|reply

It's also a result of Wikipedia's insistence to provide a citation or reference for every single thing/claim.

Turns out its easy to find a reference for anything if you search long enough.

[+] PhasmaFelis|2 years ago|reply

I am still irrationally annoyed that somebody made a Wikipedia article for their own made-up definition of high vs. low fantasy ("another world" vs. "our world", as opposed to the common definition, which was basically Tolkien vs. GRR Martin), based on a single decades-old article that didn't even mention the phrase "low fantasy", and managed to unilaterally change the definition.

No one could find a reference to contest it with, because nobody was writing articles defining terms that everyone understood; and the original editor's reference was not available online at the time, so no one could look at it and say "Uh, this is just some rando saying it would be cool if we called secondary-world stories 'high fantasy'."

Now the original, bogus reference is gone from the article, replaced with 20 low-effort articles that are obviously sourced from the same Wikipedia page they're being used to support. Boom, citogenesis! https://xkcd.com/978/

[+] RecycledEle|2 years ago|reply

> Ironically, some of the statements with 5 or more citations were the most incorrect.

There is nothing thing ironic about it. Wikipedia is run by bullies who make up references.

[+] Krasnol|2 years ago|reply

Did you fix it?

[+] slaymaker1907|2 years ago|reply

Honestly, I think the introduction and conclusion are almost always more important than the abstract. If you read the abstract, introduction, and conclusion, you should be able to understand the scope of what was done and what the paper actually shows.

[+] brotchie|2 years ago|reply

I feel like this right here is what the singularity actually feels like.

With minimal effort, humans hookup AI to do some job, and things "just get better" rather than entropy taking its natural course and many things (without maintenance) trending towards "worse".

Once you have a bunch of this human / super-human level doing mundane things on wikipedia, they're now there in perpetuity, constantly improving.

I suspect this is what's going to start to happen across the economy: all of a sudden, the sidewalks seem cleaner, and trains run on time more often, traffic seems less congested, and latency in your favorite software product starts going down (with AI being turned loose on that legacy's software's code base that it gradually refactoring and optimizing in the background).

Effectively, what typically is happening due to entropy (decay, latency, quality, dirtiness) will start to move in the opposite direction due to automation and background AI.

This reversal of perceived entropy will start gradual, and then accelerate, and then on a day-to-day basis many things you touch in your daily life will be improving and then... singularity.

[+] nemo|2 years ago|reply

Some things might get micro-optimized by AI in ways that benefit people in general, but a lot will be micro-optimized to extract more profit out of customers at their expense and to replace or coerce workers in various ways. People will wield AI the way people wield every technology. Some techno-optimists thought that TV, personal computers, the internet and other technologies would bring some bright enlightened future, pretty sure AI optimists will see the same results as the others.

[+] entropicdrifter|2 years ago|reply

>Effectively, what typically is happening due to entropy (decay, latency, quality, dirtiness) will start to move in the opposite direction due to automation and background AI.

Which of course begs the question, "Where's all that entropy going, shouldn't it still be going up?"

[+] yosito|2 years ago|reply

I have the opposite feeling. I only see AI accelerating entropy, as evidenced by the accelerated enshittification of everything online in the last few years.

[+] layer8|2 years ago|reply

> A neural network can identify references that are unlikely to support an article’s claims, and scour the web for better sources.

That seems like the wrong approach? The claims of an article should be informed by all relevant sources, not the selection of sources be informed by the claims of an article.

[+] paxys|2 years ago|reply

There is no bit of information on the planet that is corroborated by all relevant sources. There will always be some percentage of people critical of the prevailing theory, and if you want to go for 100% consensus for every article then there would be no Wikipedia.

[+] mcpackieh|2 years ago|reply

xkcd #978 as a Service

[+] iamflimflam1|2 years ago|reply

Urman points out that the Wikipedia users who tested the SIDE system were twice as likely to prefer neither of the references as they were to prefer the AI-suggested ones. “This would mean that in these cases, they would still go and search for the relevant citation online,” she says.

This seems like an optimistic interpretation.

[+] an_aparallel|2 years ago|reply

What i dont understand is how we cant just feed papers into some sort of text > logical fallacy analysis which checks for the known fallacies, checks argument logic, checks sources (scores based on study size, and other requirements which qualify good studies) and just stops things right in their tracks before being added the "corpus" of knowledge..?? I'm just talking out of my ass here - im sure politics and other human factors stop something so seemingly simple from being implemented....

https://www.researchgate.net/publication/340531396_Automated...

[+] dorpstein|2 years ago|reply

[deleted]

[+] wolverine876|2 years ago|reply

Perhaps someone who has read the paper itself can address this question:

If input a claim that is unsupported, and you search for support, isn't that a fundamental error? That is, is the software assuming the claim is accurate?

[+] yosito|2 years ago|reply

Absolutely! It seems like searching for evidence to fit a conclusion, rather than the scientific approach of trying to disprove a hypothesis.

[+] skybrian|2 years ago|reply

It seems like it would be the same search if you were looking for evidence that it’s false, and you wouldn’t find anything if it’s unsupported?

[+] lgats|2 years ago|reply

   When SIDE’s results were shown to a group of Wikipedia users, 21% preferred the citations found by the AI, 10% preferred the existing citations and 39% did not have a preference.

and 30% didn't respond?

[+] ozyschmozy|2 years ago|reply

I was wondering about that too. Later there is "[...] Wikipedia users who tested the SIDE system were twice as likely to prefer neither of the references as they were to prefer the AI-suggested ones". Not clear if this is the same as "did not have a preference". And if it's not, now we've got 110%

[+] pastage|2 years ago|reply

SIDE looks impressive I wonder how it works and finds these resources.

I think the evaluation is flawed these subjective numbers would mean nothing if I did the survey. Looking at some Wikipedia pages in the SIDE demo (linked in comments here), it is clear that they in some cases fail to identify what claims are made in the article, and that the subjective choice of references are too binary.

I double check references on wikipedia in subjects where I have basic understanding, it is usually easy to find better references but it takes so much time. So very impressive.

[+] dougmwne|2 years ago|reply

Isn’t this kind of like saying “Adding a blur filter makes people look younger.”

[+] SV_BubbleTime|2 years ago|reply

I’m sure I’m not the only person that read it as AI tiddies, so I’ll take the hit here just to let you know it wasn’t just you.

[+] ender1235|2 years ago|reply

Was looking for you lol.

[+] orsenthil|2 years ago|reply

Is there a link to this SIDE project?

[+] mistrial9|2 years ago|reply

Code availability The code to reproduce our experiments is available at https://github.com/facebookresearch/side under MIT License and Zenodo https://doi.org/10.5281/zenodo.8252866

[+] RecycledEle|2 years ago|reply

AI is the replacement for low-effort work.

What annoys me is that all the managers who were accruing technical debt were right. AI will clean it up for them, decades later.

[+] mistrial9|2 years ago|reply

this publicity does not include the MWF internal efforts for the last six+ years? only a META-owned research project?

87 comments