When Sci-Hub made a lot of papers available to the public, I started clicking through to more references on Wikipedia.
My goal was to learn more and go deeper on subjects, but I was stunned by how often the linked citation didn't support the claim in the Wikipedia article. There were many times where the linked citation said the opposite of the Wikipedia article.
My theory was that overly competitive Wikipedia authors were skimming PubMed abstracts and assuming the paper would support their assertion. Ironically, some of the statements with 5 or more citations were the most incorrect.
Trying to correct these articles is some times like going to war with editors who refuse to admit they were wrong.
Not only do some papers not support the claim at all (I found this to often be the case when I used to read the news regularly), but there is often a critical detail left out of the abstract that is key to how the study should be interpreted. The vast majority of people citing papers, by my measure, are relying solely on the abstract even when the entire paper is available for free.
This is often the case outside of Wikipedia as well. Truthfully, I'm a layman because I don't have an advanced degree that suggests I'm qualified to interpret papers, but things are bad enough IMO that part of me wishes people would stop trying to communicate science to the public until this broken system heals at least somewhat. On a related note, it doesn't help that folks like Andrew Huberman are normalizing this idea that one or two studies are good enough to form conclusions about how average people can "optimize" (micromanage) their lives in ways that are clinically relevant. This isn't to take away from the good that Huberman does, but I think it send a the wrong message to people who don't actually have any experience reading papers.
One thing that might help is forbidding abstracts and conclusion sections in papers published in journals, which might cut down on some of the misunderstandings and make it harder to pass off a paper as support for a claim.
> Trying to correct these articles is some times like going to war with editors who refuse to admit they were wrong.
There is virtually no point in trying to fix Wikipedia pages until Wikipedia fixes themselves. They have an extremely hard job, but their prolific editors are too bully-like and contradict themselves all the time.
> My goal was to learn more and go deeper on subjects, but I was stunned by how often the linked citation didn't support the claim in the Wikipedia article. There were many times where the linked citation said the opposite of the Wikipedia article.
There's an incentive for scholarly publishers and authors to have their works cited on Wikipedia, since there is a belief (supported by some research [0]) that citation on Wikipedia increases subsequent scholarly citations to those articles. Such a benefit could be used to increase a Journal Impact Factor & similar metrics [1,2] or the citations to one's own works, which is beneficial to academics competing for promotion, tenure, and research grants which are partially based on citation metrics.[3] I wouldn't be surprised if there was some indiscriminate citation on Wikipedia to bolster the metrics.
Unfortunately, many people treat citations as a formality rather than something to help someone judge credibility. I try to cite as specifically as possible. Cite the page if possible (and which column if available), section if possible, and if you want to go the extra mile, the paragraph or line number if possible. That makes checking easy. This was common when I worked at the USPTO as a patent examiner, but wasn't standard practice when I was in academia.
Over time, I've come to see that Wikipedia mirrors the broader internet: initially heralded as a game-changer, but when it really counts, it can be just as unreliable as the things that preceded it, if not worse.
I keep thinking of that quote from the Office:
"Wikipedia is the best thing ever. Anyone in the world can write anything they want about any subject, so you know you are getting the best possible information."
I had a math professor who would occasionally correct an equation in the Wikipedia article about some theorem he invented, but someone would always change it back. He eventually gave up.
I am still irrationally annoyed that somebody made a Wikipedia article for their own made-up definition of high vs. low fantasy ("another world" vs. "our world", as opposed to the common definition, which was basically Tolkien vs. GRR Martin), based on a single decades-old article that didn't even mention the phrase "low fantasy", and managed to unilaterally change the definition.
No one could find a reference to contest it with, because nobody was writing articles defining terms that everyone understood; and the original editor's reference was not available online at the time, so no one could look at it and say "Uh, this is just some rando saying it would be cool if we called secondary-world stories 'high fantasy'."
Now the original, bogus reference is gone from the article, replaced with 20 low-effort articles that are obviously sourced from the same Wikipedia page they're being used to support. Boom, citogenesis! https://xkcd.com/978/
Honestly, I think the introduction and conclusion are almost always more important than the abstract. If you read the abstract, introduction, and conclusion, you should be able to understand the scope of what was done and what the paper actually shows.
I feel like this right here is what the singularity actually feels like.
With minimal effort, humans hookup AI to do some job, and things "just get better" rather than entropy taking its natural course and many things (without maintenance) trending towards "worse".
Once you have a bunch of this human / super-human level doing mundane things on wikipedia, they're now there in perpetuity, constantly improving.
I suspect this is what's going to start to happen across the economy: all of a sudden, the sidewalks seem cleaner, and trains run on time more often, traffic seems less congested, and latency in your favorite software product starts going down (with AI being turned loose on that legacy's software's code base that it gradually refactoring and optimizing in the background).
Effectively, what typically is happening due to entropy (decay, latency, quality, dirtiness) will start to move in the opposite direction due to automation and background AI.
This reversal of perceived entropy will start gradual, and then accelerate, and then on a day-to-day basis many things you touch in your daily life will be improving and then... singularity.
Some things might get micro-optimized by AI in ways that benefit people in general, but a lot will be micro-optimized to extract more profit out of customers at their expense and to replace or coerce workers in various ways. People will wield AI the way people wield every technology. Some techno-optimists thought that TV, personal computers, the internet and other technologies would bring some bright enlightened future, pretty sure AI optimists will see the same results as the others.
>Effectively, what typically is happening due to entropy (decay, latency, quality, dirtiness) will start to move in the opposite direction due to automation and background AI.
Which of course begs the question, "Where's all that entropy going, shouldn't it still be going up?"
I have the opposite feeling. I only see AI accelerating entropy, as evidenced by the accelerated enshittification of everything online in the last few years.
> A neural network can identify references that are unlikely to support an article’s claims, and scour the web for better sources.
That seems like the wrong approach? The claims of an article should be informed by all relevant sources, not the selection of sources be informed by the claims of an article.
There is no bit of information on the planet that is corroborated by all relevant sources. There will always be some percentage of people critical of the prevailing theory, and if you want to go for 100% consensus for every article then there would be no Wikipedia.
Urman points out that the Wikipedia users who tested the SIDE system were twice as likely to prefer neither of the references as they were to prefer the AI-suggested ones. “This would mean that in these cases, they would still go and search for the relevant citation online,” she says.
What i dont understand is how we cant just feed papers into some sort of text > logical fallacy analysis which checks for the known fallacies, checks argument logic, checks sources (scores based on study size, and other requirements which qualify good studies) and just stops things right in their tracks before being added the "corpus" of knowledge..?? I'm just talking out of my ass here - im sure politics and other human factors stop something so seemingly simple from being implemented....
Perhaps someone who has read the paper itself can address this question:
If input a claim that is unsupported, and you search for support, isn't that a fundamental error? That is, is the software assuming the claim is accurate?
When SIDE’s results were shown to a group of Wikipedia users, 21% preferred the citations found by the AI, 10% preferred the existing citations and 39% did not have a preference.
I was wondering about that too. Later there is "[...] Wikipedia users who tested the SIDE system were twice as likely to prefer neither of the references as they were to prefer the AI-suggested ones". Not clear if this is the same as "did not have a preference". And if it's not, now we've got 110%
SIDE looks impressive I wonder how it works and finds these resources.
I think the evaluation is flawed these subjective numbers would mean nothing if I did the survey. Looking at some Wikipedia pages in the SIDE demo (linked in comments here), it is clear that they in some cases fail to identify what claims are made in the article, and that the subjective choice of references are too binary.
I double check references on wikipedia in subjects where I have basic understanding, it is usually easy to find better references but it takes so much time. So very impressive.
[+] [-] Aurornis|2 years ago|reply
My goal was to learn more and go deeper on subjects, but I was stunned by how often the linked citation didn't support the claim in the Wikipedia article. There were many times where the linked citation said the opposite of the Wikipedia article.
My theory was that overly competitive Wikipedia authors were skimming PubMed abstracts and assuming the paper would support their assertion. Ironically, some of the statements with 5 or more citations were the most incorrect.
Trying to correct these articles is some times like going to war with editors who refuse to admit they were wrong.
[+] [-] ravenstine|2 years ago|reply
This is often the case outside of Wikipedia as well. Truthfully, I'm a layman because I don't have an advanced degree that suggests I'm qualified to interpret papers, but things are bad enough IMO that part of me wishes people would stop trying to communicate science to the public until this broken system heals at least somewhat. On a related note, it doesn't help that folks like Andrew Huberman are normalizing this idea that one or two studies are good enough to form conclusions about how average people can "optimize" (micromanage) their lives in ways that are clinically relevant. This isn't to take away from the good that Huberman does, but I think it send a the wrong message to people who don't actually have any experience reading papers.
One thing that might help is forbidding abstracts and conclusion sections in papers published in journals, which might cut down on some of the misunderstandings and make it harder to pass off a paper as support for a claim.
> Trying to correct these articles is some times like going to war with editors who refuse to admit they were wrong.
There is virtually no point in trying to fix Wikipedia pages until Wikipedia fixes themselves. They have an extremely hard job, but their prolific editors are too bully-like and contradict themselves all the time.
[+] [-] gwern|2 years ago|reply
WP is, unfortunately, not obviously any worse than researchers in general are: https://gwern.net/leprechaun#miscitation
> Trying to correct these articles is some times like going to war with editors who refuse to admit they were wrong.
Yep. Because now you are doing 'OR' by interpreting the paper, especially when the abstract is just lying/spin.
[+] [-] murphyslab|2 years ago|reply
[0] https://scholarlykitchen.sspnet.org/2022/11/01/guest-post-wi...
[1] https://en.wikipedia.org/wiki/Impact_factor
[2] https://en.wikipedia.org/wiki/Citation_impact
[3] https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6668985/
[+] [-] btrettel|2 years ago|reply
Previous comment of mine on this issue (including a bit about how to make URLs more specific): https://news.ycombinator.com/item?id=23897686
[+] [-] poszlem|2 years ago|reply
I keep thinking of that quote from the Office: "Wikipedia is the best thing ever. Anyone in the world can write anything they want about any subject, so you know you are getting the best possible information."
[+] [-] sheepshear|2 years ago|reply
[+] [-] nolist_policy|2 years ago|reply
Turns out its easy to find a reference for anything if you search long enough.
[+] [-] PhasmaFelis|2 years ago|reply
No one could find a reference to contest it with, because nobody was writing articles defining terms that everyone understood; and the original editor's reference was not available online at the time, so no one could look at it and say "Uh, this is just some rando saying it would be cool if we called secondary-world stories 'high fantasy'."
Now the original, bogus reference is gone from the article, replaced with 20 low-effort articles that are obviously sourced from the same Wikipedia page they're being used to support. Boom, citogenesis! https://xkcd.com/978/
[+] [-] RecycledEle|2 years ago|reply
There is nothing thing ironic about it. Wikipedia is run by bullies who make up references.
[+] [-] Krasnol|2 years ago|reply
[+] [-] slaymaker1907|2 years ago|reply
[+] [-] brotchie|2 years ago|reply
With minimal effort, humans hookup AI to do some job, and things "just get better" rather than entropy taking its natural course and many things (without maintenance) trending towards "worse".
Once you have a bunch of this human / super-human level doing mundane things on wikipedia, they're now there in perpetuity, constantly improving.
I suspect this is what's going to start to happen across the economy: all of a sudden, the sidewalks seem cleaner, and trains run on time more often, traffic seems less congested, and latency in your favorite software product starts going down (with AI being turned loose on that legacy's software's code base that it gradually refactoring and optimizing in the background).
Effectively, what typically is happening due to entropy (decay, latency, quality, dirtiness) will start to move in the opposite direction due to automation and background AI.
This reversal of perceived entropy will start gradual, and then accelerate, and then on a day-to-day basis many things you touch in your daily life will be improving and then... singularity.
[+] [-] nemo|2 years ago|reply
[+] [-] entropicdrifter|2 years ago|reply
Which of course begs the question, "Where's all that entropy going, shouldn't it still be going up?"
[+] [-] yosito|2 years ago|reply
[+] [-] layer8|2 years ago|reply
That seems like the wrong approach? The claims of an article should be informed by all relevant sources, not the selection of sources be informed by the claims of an article.
[+] [-] paxys|2 years ago|reply
[+] [-] mcpackieh|2 years ago|reply
[+] [-] iamflimflam1|2 years ago|reply
This seems like an optimistic interpretation.
[+] [-] an_aparallel|2 years ago|reply
https://www.researchgate.net/publication/340531396_Automated...
[+] [-] dorpstein|2 years ago|reply
[deleted]
[+] [-] wolverine876|2 years ago|reply
If input a claim that is unsupported, and you search for support, isn't that a fundamental error? That is, is the software assuming the claim is accurate?
[+] [-] yosito|2 years ago|reply
[+] [-] skybrian|2 years ago|reply
[+] [-] lgats|2 years ago|reply
[+] [-] ozyschmozy|2 years ago|reply
[+] [-] pastage|2 years ago|reply
I think the evaluation is flawed these subjective numbers would mean nothing if I did the survey. Looking at some Wikipedia pages in the SIDE demo (linked in comments here), it is clear that they in some cases fail to identify what claims are made in the article, and that the subjective choice of references are too binary.
I double check references on wikipedia in subjects where I have basic understanding, it is usually easy to find better references but it takes so much time. So very impressive.
[+] [-] dougmwne|2 years ago|reply
[+] [-] SV_BubbleTime|2 years ago|reply
[+] [-] ender1235|2 years ago|reply
[+] [-] orsenthil|2 years ago|reply
[+] [-] mistrial9|2 years ago|reply
[+] [-] RecycledEle|2 years ago|reply
What annoys me is that all the managers who were accruing technical debt were right. AI will clean it up for them, decades later.
[+] [-] mistrial9|2 years ago|reply
[+] [-] daniel-cussen|2 years ago|reply
[deleted]
[+] [-] unknown|2 years ago|reply
[deleted]
[+] [-] unknown|2 years ago|reply
[deleted]
[+] [-] weffosneffo878|2 years ago|reply
[deleted]