Hundreds of extreme self-citing scientists revealed in new database

[+] cbanek|6 years ago|reply

As someone who now works in astronomy, I'm not at all surprised at the high self-citation rate for the field. It is true that a lot of papers are published by large consortiums. For example, at LSST (where I work), if you have been working on the project for 2 years, you are considered a "builder" and added as an author to all major project wide papers.

Those papers, which tend to be long and full of great stuff, are cited a lot, and have hundreds of authors.

I wonder how many of these papers are where the first author has cited other papers where they are the first author. (Or really, at least the first few authors) It seems like for the data shown, it is just if anyone in the author list is anywhere in the author list of the citation?

Also for some research niches, you may be one of the few people writing papers on a subject. There's no one else to cite.

I do think there's some very valid points about bringing the person up to speed on previous research that brought them to the current paper. But I don't think those citations should really count as a citation in terms of metrics for how successful a scientist is.

To be honest, I find all the metric gaming about number of papers and citations to be ridiculous. I don't hear many people saying they want to write the best paper in their field, or something new. It all seems to be a numbers game these days. Academic career growth hacking, if you will.

[+] o09rdk|6 years ago|reply

This probably varies by field, but the "large project" thing can be gamed too.

So, for example, in biomedicine you often have lots of people on a paper who might only read a draft, make some trivial suggestions, and then be added as an author.

As a result, there's this pressure for large groups to form, where everyone is added and everyone can cite each other.

This doesn't mean the projects are bad, but it does lead to individuals with large citation counts primarily because they find ways to add themselves to everything, regardless of their level of effort. People should get credit where it's due, and large projects involve lots of people. But what defines a "project" has become very vague.

I've become extraordinarily disillusioned with academics. Science gets done but the rewards seem to filter preferentially to those who are able to game the system, and the system exists out of a need to make one's self look as productive as possible, in areas where contributions are generally necessarily tiny or nonexistent, even among very competent people, because the problems are hard and because so many people see the same things at the same time.

[+] Jhsto|6 years ago|reply

> I wonder how many of these papers are where the first author has cited other papers where they are the first author.

Also, it's worth mentioning that countries which support PhD via publication essentially require you to conduct self-citing research. This is to show you've had a common thread between your research, and that the PhD can be defended as to have all the papers be considered to be on the same subject.

[+] YeGoblynQueenne|6 years ago|reply

>> To be honest, I find all the metric gaming about number of papers and citations to be ridiculous.

That's the main issue, isn't it? Citations are a bit like tokens that can be exchanged for funding, so they become a commodity that people are incentivised to hoard and trade. That is just the worse kind of environment to promote good quality research. The only thing it can promote is ...lots of citations.

[+] melling|6 years ago|reply

"In 2017, a study showed that scientists in Italy began citing themselves more heavily after a controversial 2010 policy was introduced that required academics to meet productivity thresholds to be eligible for promotion"

Cobra Effect

https://en.wikipedia.org/wiki/Cobra_effect

[+] tomtomistaken|6 years ago|reply

More like Goodhart‘s law: "When a measure becomes a target, it ceases to be a good measure."

[+] MaxBarraclough|6 years ago|reply

I don't see that it's the Cobra effect. Did it actively make them less productive?

[+] lonelappde|6 years ago|reply

Is being cited a productivity metric?

[+] not2b|6 years ago|reply

Self-citation is appropriate for a new paper that builds on the results of a previous paper. But in evaluating how influential a researcher is, it makes sense to exclude self-citation, while being careful to avoid any implication that self-citation is wrong.

[+] hannob|6 years ago|reply

The core problem here is that universities think that citation statistics are a useful metric to evaluate the quality of the work of a scientist. There's plenty of evidence that this is not the case or that even the reverse may be the case [1], but this idea refuses to die.

[1]

[+] PeterisP|6 years ago|reply

It sucks as a metric but it does have some rough correlation in most cases, and I'm not aware of any better easily measurable metric - if you have one in mind, it'd be great to hear. The alternative of having a bureaucrat "simply judge quality" IMHO is even worse, even less objective, and even more prone to being gamed.

The main problem is that there is an objective need (or desire?) by various stakeholders to have some kind of metric that they can use to roughly evaluate the quality or quantity of scientist's work, with the caveat people outside your field need to be able to use it. I.e. let's assume that we have a university or government official that for some valid reason (there are many of them) needs to be able to compare two mathematicians without spending excessive time on it. Let's assume that the official is honest, competent and in fact is a scientist him/herself and so can do the evaluation "in the way that scientists want" - but that official happens to be, say, a biologist or a linguist. What process should be used? How should that person distinguish insigtful, groundbreaking novel and important research from pseudoscience or salami-sliced paper that's not bringing anything new to the field? I can evaluate papers and people in my research subfield, but not far outside of it. Peer review for papers exists because we consider that people outside of the field are not qualified to directly tell whether that paper is good or bad.

The other problem, of course, is how do you compare between fields - what data allows you to see that (for example) your history department is doing top-notch research but your economics department is not respected in their field?

I'm not sure that a good measurement can exist, and despite all their deep flaws it seems that we actually can't do much better than the currently used bibliographic metrics and judgement by proxy of journal ratings.

Saying "metric X is bad" doesn't mean "metric X shouldn't get used" unless a better solution is available.

[+] cassowary37|6 years ago|reply

Before everyone goes bananas citing Goodhart's law: many universities and academic medical centers in the US don't care at all about impact factor - they care about grant $$, period full stop. (They appreciate the occasional high-impact paper that they can use in marketing materials, but it's really all about the $$.)

And for what it's worth, I've almost never heard impact factors discussed at NIH study sections, where investigator quality is explicitly on the agenda. Reviewers talk about relevant prior publications in the field, esp in marquee journals. [this latter feature is the reason we don't just put everything on biorxiv or equivalent and move on.]

[+] jasonzemos|6 years ago|reply

It seems as if universities live in the dystopia that the software industry avoided when it stopped counting lines of code.

[+] godelski|6 years ago|reply

Did you forget to include the source? (I'm using an app so maybe it's not displaying it?)

[+] ramblerman|6 years ago|reply

The flip side of that is in the soft sciences. There are so many PHDs walking around in gender studies, sociology that have published papers that were never cited once!

We do need a metric imo, but I agree we don't have a perfect one yet.

[+] lonelappde|6 years ago|reply

You forgot to edit post to link to your own comment.

[+] 19ylram49|6 years ago|reply

Haha. Well done.

[+] dwd|6 years ago|reply

So when querying a count of citations, did anyone consider adding a GROUP BY contributors which will at least give you distinct groups of contributors (assuming they always get listed alphabetically)

Even better split it into individual contributors to give a count of researchers who have cited the paper?

[+] asimjalis|6 years ago|reply

PageRank might be better way to evaluate quality. It too can be gamed. Maybe not as easily, though.

[+] einpoklum|6 years ago|reply

I'm not Italian... and am not meeting any productivity threshold.

But my work is incremental, and I obviously don't want to repeat what I said in a different paper, so I cite earlier work in later work. TBH, I don't think it's possible to avoid self-citation unless:

1. Your research is so popular that by the time you need to cite it, it's been surveyed, or improved upon, or otherwise adapted. 2. You switch research subjects relatively often. 3. You publish "blocks" of work, each based on fundamentals in your field established by others - and they're not incremental.

[+] tony|6 years ago|reply

If you narrow yourself to a specific niche well enough, you'll see the same names in citations. To be fair, the areas I dig into don't feel nearly as competitive as say, physics, which I couldn't make heads or tails of.

The whole reason the internet and wikis took off is we were very liberal in how we linked. If we disallowed inbound citations, wouldn't it be a lot harder to backtrack and grasp contextual underpinnings?

Anecdote: In the field of adult attachment theory <-> love there are a few prominent scholars that cite each other: Shaver, Hazan, Mikulincer. They do papers citing their own work and each other [1]. There's also a book by Mikulincer highlights Shaver's upbringing with his parents, his past as a hippy, etc. They're delivering very nice content, and they cite others outside their ("circle"?)

Are there potentially scholars in the field with valuable contributions that go unnoticed? Possibly. It doesn't make self-citations in their papers any less helpful. Also I worry that regulating citations through some system may affect the quality of content and fix something that's not broken.

Which brings me to another issue, aren't we supposed to be helping each other?

[1] Example: http://adultattachmentlab.human.cornell.edu/HazanShaver1990....

[+] LeonB|6 years ago|reply

When you say ‘“circle”?’ I think “clique” is appropriate as ‘a network where every node is connected to every other node’.

[+] ChuckMcM|6 years ago|reply

Perhaps it would be useful for reviewers to point out which citations do not contribute to the paper? It really is a tough problem. If someone is toiling along in some niche they have carved out, they and their colleagues may be the only one working in that space. That leads to a lot of cross citation and self citation.

That said, if you publish paper A, and then cite it in paper B which builds on that work, then in paper C you really only need to cite paper B if you're building on the work, not B and A. It might make for in interesting data set to plot out those sorts of relationships.

[+] _delirium|6 years ago|reply

As a reader I personally prefer if they do a more complete set of citations, instead of making me follow up a multi-step chain to dig them up, as if I'm a compiler resolving transitive dependencies. I like little history-map sentences like: "This technique was introduced by Foo (1988) and recast in the modern computational formalism by Bar (2009); the present work uses an optimized variant (Bar 2012)."

You could just cite the last paper here, which is the only one used directly, and which presumably itself cites the earlier papers. But it's more useful to me if you include the version of the sentence that cites all three and briefly explains their relationship.

[+] mehrdadn|6 years ago|reply

> if you publish paper A, and then cite it in paper B which builds on that work, then in paper C you really only need to cite paper B if you're building on the work, not B and A.

Logically I agree with you, but a lot of academics seem to believe differently when it comes to citing other people's work, and if we are to go by that logic (which a lot of people are inevitably forced to do), I don't see why one should treat their own work any differently.

[+] unknown|6 years ago|reply

[deleted]

[+] std_throwaway|6 years ago|reply

You need a source of trust in these systems. Journals used to have that role. They had high standards that were upheld by editors selecting only worthy publications. Today it seems that many journals aren't as trustworthy as they seemed to be in the past. It's also easier to spam the journals with your publication and to bullshit your way into publication. The incentives to publish a lot are also way higher now that your grant money is highly dependent on your citation count. Journals can publish more and easier and lower the standards for submission to earn more money. The system is basically eating itself and we haven't found a cure yet.

Filtering for self-citations is useful to identify the bubbles. But it is not sufficient to determine if those bubbles only contain hot air or if these scientists are actually working on something with substance in a narrow field where few others publish.

[+] snarf21|6 years ago|reply

"It Is Difficult to Get a Man to Understand Something When His Salary Depends Upon His Not Understanding It" -Upton Sinclair

[+] Vinnl|6 years ago|reply

Citations should primarily serve to mention relevant work, which often includes authors' earlier works.

The problem really is the abuse of citation metrics and journal brand names (and especially journal-based metrics) as a means of evaluating researchers. What we really need is a different method of evaluating researchers that does not rely on where they publish or what they cite.

(But I would say that, given that I work on one such a system.)

[+] throwawaywego|6 years ago|reply

The opposite of extreme self-citing is self-plagiarism (either out of ignorance, to avoid extreme self-citing on ground-breaking research, or with malicious intent: passing the same paper to multiple journals as a new result).

> The rate of duplication in the rest of the biomedical literature has been estimated to be between 10% to 20% (Jefferson, 1998), though one review of the literature suggests the more conservative figure of approximately 10% (Steneck, 2000). https://ori.hhs.gov/plagiarism-13

If work by another author was enough to inspire you and add a reference, then your own previous work should certainly qualify, if it added inspiration to the current paper. Self-citing provides a "paper trail" for the reader when they want to investigate a claim or proof further.

(Like PageRank, it is very possible to discount internal PR/links under external links, and when you also take into account the authority of the referencer, you avoid scientists accumulating references from non-peer reviewed Arxiv publications).

[+] bifrost|6 years ago|reply

I found this situation regularly when going down the rabbithole of the anti-vaxx or anti-5g people. One "scientist" makes a highly dubious claim, thousands of nutjobs cite this one scientist, "scientist" then goes on to cite articles that cites their work. I'm basically waiting to find Alex Jones cited in a serious article at this point.

[+] lonelappde|6 years ago|reply

This is called citogenesis.

https://en.m.wikipedia.org/wiki/Wikipedia:List_of_citogenesi...

[+] std_throwaway|6 years ago|reply

If you work in a very narrow field of science you're basically on your own and have to cite yourself because there's nobody else to cite.

[+] fuzz4lyfe|6 years ago|reply

John Money and his "work" is a good example of this

[+] khawkins|6 years ago|reply

Went into the data and took the top 1000 individuals with self cite percentages over 40%, then sorted by institution. Nearly every major institution had individuals in this group: Johns Hopkins (4), Cal Tech (4), Georgia Tech (2), MIT (5), each of the Max Planck Institute campuses (3-7), Moscow State (7), Penn State (6), Stanford (1), Utrecht (2), University of Zurich (4), ETH Zurich (1), DLR (3), Imperial College London (3), University of Tokyo (2), Princeton (5), Kyoto University (4)...

I feel like if this problem were very concerning we'd see the distribution concentrated at certain institutions but I'm not sure there's one with over 10 researchers at them. We hear a lot about questionable Chinese journals, but the highest institution in this list is the Chinese Academy of Sciences with 3 individuals.

I think the more likely case is there are a few bad apples, some bad practices we can't ever fully get rid of, and that some research lends itself more to self-citation.

[+] jacquesm|6 years ago|reply

Given that pagerank had its origins in citations it should not be surprising to find link farms and other spam in scientific publications.

[+] chiefalchemist|6 years ago|reply

For those who wonder why there are climate-change deniers who won't listen to "the science" this article is for you. The sad and honest fact is, science has become a spin-factory in a way mainstream media has.

Yes, in theory, the scientic method / process is a wonderful standard. Unfortunately, once it's exposed to egos and profits it becomes somethings else far less worth of praise and honor.

I'm not doing a take down of science, science has already done that to itself. The sooner the rest of us come to terms with that, the better.

[+] SubiculumCode|6 years ago|reply

Says ChiefAlchemist? Twas Newton's chief failing, alchemy.

[+] toomuchequate|6 years ago|reply

[deleted]

[+] PeterStuer|6 years ago|reply

What surprises me is how naive the scientific community publicly pretends to be on these matters.

We 'marketed' science as due to our socioeconomic dogmas vaguely based on completely misunderstood caricatured Darwin ion theory there can be no alternative. We turn science into a quantative metrics game, and by golly, act all surprised that scientists do game the system?

How useful do you think Google Search would be if they just stopped after Pagerank v0.1 and called it a day, then let all the websites 'vote with their links'?

[+] your-nanny|6 years ago|reply

People who are working out their ideas far outside the mainstream have no one to cite but them selves. Some are quacks, to be sure, but sometimes a field is just not ready for their work because the utility of the idea is not easily apparent, or because it's perceived to be too risky. A field needs a healthy mix of the the curmudgeonly stubborn thinkers going their way no matter the cost, and those making steady progress on solvable problems.

[+] jacquesm|6 years ago|reply

Sure, but there is no 'pruning of the tree' in case a dead end is reached, so the citations stay allowing the quack to pretend they have more credibility than they do. In fact, the whole idea of these citations is to build credibility where there is none.

[+] joe_the_user|6 years ago|reply

The need to step outside of mainstream views has been important for intellectual progress over the years, indeed. The problem we have now is we're stuck with a series of bad faith, polarizing and disingenuous views that "suck the oxygen out of" actual wide-ranging thinking - IE, climate denialists can still get a lot of money, anti-vaxers can pull in a lot of money from scams, etc.

[+] guntars|6 years ago|reply

Do you have any good examples, historical or otherwise?

[+] joe_the_user|6 years ago|reply

On a semi-related note, I occasionally look at the news items Google news suggests for me, and these include a significant portion of climate change denialists propaganda, including one shocked, shocked by Nature "suppressing academic freedom with this list" (and my searches are never for climate denialism).

Which is to say, these may be a few sciences but it seems like they significant resources behind them, somehow.

[+] zdragnar|6 years ago|reply

Strange, I use google news on a semi regular basis, and have the opposite experience; most are fairly bland coverage of climate science research announcements with the occasional hyperbolic doomsday stuff.

[+] jeffwass|6 years ago|reply

My undergrad college physics professor (Fay Ajzenberg-Selove) introduced this metric back in the 50’s. She faced major sexism and bullshit claims against her productivity, and had to use this metric to prove her detractors wrong, that she was as good as or better than most of her male colleagues in terms of performing useful and interesting research, to earn herself a faculty position.

https://en.m.wikipedia.org/wiki/Fay_Ajzenberg-Selove

[+] YeGoblynQueenne|6 years ago|reply

Counting citations is a rubbish metric in general. It's supposed to be a proxy for reserach quality but it's so easy to "game" (in the sense of optimising for citation count, rather than research quality) that a high number of citations doesn't mean anything.

Neither does a low number of citations. For example, my field is small and kind of esoteric, so we don't get lots of citations either from the outside or the inside (one of the most influential papers in the field has... 286 citations on Semantic Scholar; since 1995).

With a field as small as a couple hundred researchers it's also very easy to give the appearance of a citation mill. Given that papers will focus on a very specific subject in the purview of the field, it is inevitable that each researcher who studies that specific subject will cite the same handful of researchers' papers over and over again- and be herself cited by them, since she's now publishing on the subject that interests them.

As to self-citations, like Ioannidis himself says there are legitimate reasons, for instance, a PhD student publishing with her thesis advisor as a co-author. The student will most probably be working on subjects that the advisor has already published on and in fact will most likely be extending the advisor's prior work. So the advisor's prior work will be cited in the student's papers.

So I'm really not sure what we're learning in the general case by counting citations, other than that a certain paper has a certain number of citations.

[+] SubiculumCode|6 years ago|reply

To be fair to some researchers in certain specializations, there may only be a handful of scientists publishing on the topic. Self-Citation proceeds naturally from such circumstances.

[+] tikiman163|6 years ago|reply

Having conducted a reasonable amount of academic and scientific research, this metric is more likely to be mischaracterizing research than revealing any issues. This doesn't even establish a causal-link between self-citation and poor research quality, it just assumes it.

Most researchers continue to do new research on the same concept after a publication, and they will of course site their earlier work when continuing. Additionally, post-graduate researchers often have their names placed on the research of grad students they are in charge of, even though they often have minimal involvement in the research or conclusions drawn.

You might be able tell something from the ratio of other authors from all citations to the number of self-citations, but only if you could eliminate self citations that were not either inclusion by proxy or cases where they are merely continuing research on the same topic with new methodologies.

There are already methods for identifying bad research, none of which can be achieved through the use of non-human-assisted data analysis of the authors list of research. The only way to be sure is critical review and 3rd party verification of results with repeated experiments.

208 comments