The Dunning-Kruger Effect Is Autocorrelation

[+] andersource|3 years ago|reply

Very interesting article and statistical analysis, but I really don't see how it concludes that the DK effect is wrong based on the analysis. The fact that the DK effect emerges with _completely random data_ is not surprising at all - in this case the intuitive null hypothesis would be that people are good at estimating their skill, therefore there would be strong a correlation between their performance and self-evaluation of said performance. If the data weren't related, then this hypothesis isn't likely, which is exactly what DK means. And indeed if you look at the plots in the article (of the completely random data), they depict a world in which people are very bad at estimating their own skill, therefore, statistically, people with lower skills tend to overestimate their skills, and experts tend to underestimate it.

Also wanted to point out that in general there is no issue with looking at y - x ~ x, this is called the residual plot, and is specifically used to compare an estimate of some value vs. the value itself.

That being said, the author seems very confident in their conclusion, and from the comments seems to have read a lot of related analyses, so I might be missing something. ¯\_(ツ)_/¯

[+] leto_ii|3 years ago|reply

> therefore there would be strong a correlation between their performance and self-evaluation of said performance. If the data weren't related, then this hypothesis isn't likely, which is exactly what DK means.

DK doesn't mean no correlation, it means inverse correlation. It's the correct analysis at the bottom that shows what no correlation actually looks like (at least no correlation in tend, there is heteroskedasticity).

> a world in which people are very bad at estimating their own skill, therefore, statistically, people with lower skills tend to overestimate their skills, and experts tend to underestimate it.

Be careful here, the conclusion you drew doesn't actually follow.

> y - x ~ x, this is called the residual plot

You're giving x and y meaning that they don't have. In the article these are uncorrelated random variables - the plot of y-x ~ x will always look that way. That's however not the case if you're plotting y_hat - y ~ y_hat for a y_hat taken out of a model. That won't be a random variable in your setup.

Edit: note on heteroskedasticity

[+] alecbz|3 years ago|reply

The author’s confidence is itself an indication that they’re more likely to be wrong.

Kidding. Well, half-kidding, I did kind of find the tone a bit biting and dismissive, especially towards one of the commenters that were pointing out exactly what you did.

It’s an interesting question to ask whether ask whether the uniformly random data “really” exhibits DK or not, and whether that’s interesting. A world where people have 0 ability to assess their own skill and resort to making uniformly random guesses at it is kind of interesting, and of course in such a world more skilled people would end up on average underestimating themselves and vice versa.

But I think the author’s right that obviously nothing psychological is happening here. There’s the psychological effect of no one being able to assess themselves, but the fact that unskilled people overestimate themselves in this world has nothing to do with the fact that they are unskilled.

[+] _dain_|3 years ago|reply

>The fact that the DK effect emerges with _completely random data_ is not surprising at all - in this case the intuitive null hypothesis would be that people are good at estimating their skill, therefore there would be strong a correlation between their performance and self-evaluation of said performance. If the data weren't related, then this hypothesis isn't likely, which is exactly what DK means.

DK effect is not that low skill people are overconfident and high skill people are underconfident. It is specifically that low skill people are more overconfident than high skill people are underconfident. i.e. if someone's estimated skill is true_skill+bias+noise, then bias_lowskill > -bias_highskill.

This is very clear in the original DK paper, they specifically focus on the supposed metacognitive deficiencies of low-skill people.

The article argues that the graphs supposedly demonstrating this fact, can also be generated from a model that does not have this difference, i.e. where bias_lowskill == bias_highskill.

EDIT: My characterization of the article is not correct, see here[1] for a visualization of the point I'm trying to make.

[1] http://emilkirkegaard.dk/understanding_statistics/?app=Dunni...

[+] IceDane|3 years ago|reply

I'm definitely not even close to a statistician, but I'm also having a hard time accepting this analysis.

I'll admit that part of it also comes from personal experience, at work and elsewhere. I've met some catastrophically incompetent people were completely oblivious to their own incompetence, and this has very often felt like that the more incompetent they were, the more likely they were to be try to do stuff that was waaaay out of their comfort zone, which would make even experienced, competent people tread carefully.

But even ignoring personal experiences, I'm not convinced by the arguments either. I understand what they are saying, but I don't see how this disproves the DK effect.

Even if everyone is equally bad at estimating their own skill, so that their estimate is essentially a completely random variable, then we would expect the self-assessment score average to be around 50. If I understand it correctly, this is essentially what figure 9 is demonstrating.

But that figure still says that worse performers are then likely to overestimate their own ability, just as much as it says that better performers are bad at it.

If we look at the original DK figure and contrast it with figure 9 with random data, then I think one way of interpreting the differences is that, yes, worse performers are indeed bad at self-assessment, but they're just kind of bad at it as if their self-assessment is a completely random variable. It then seems to keep being essentially random but as people's skills improve, the distance between their score and their self-assessment becomes a bit tighter.. so in conclusion: most people are pretty bad at self-assessment, but skilled people are a bit less so.

The end result is still that people in the bottom quartiles are going to over-estimate their own ability.

I don't know, maybe this is way out in the weeds. Please school me.

[+] uldos|3 years ago|reply

Unskilled people are more random with their self assessment that skilled. It has nothing to do with unskilled people thinking that they know everything.

[+] kenjackson|3 years ago|reply

> Also wanted to point out that in general there is no issue with looking at y - x ~ x, this is called the residual plot, and is specifically used to compare an estimate of some value vs. the value itself.

This article seemed very unconvincing -- and this part noted above, early on in the article set the tone that I felt like the author didn't know what they were doing. And even after reading it all, I felt like the standard lay use of DK remained valid.

This just felt like the type of thing I would have thought about as an undergrad, started to write it, and then realized it didn't make sense halfway through it. Or maybe I just missed something as well...

[+] usefulcat|3 years ago|reply

IMO the most interesting thing is not so much that you can get DK from noise, it's that the Nuhfer study was utterly unable to replicate the DK effect. If DK is real, there should have been at least a hint of it visible in the Nuhfer study.

[+] Accujack|3 years ago|reply

>the author seems very confident in their conclusion

They are, but honestly all that can be concluded safely IMHO is that the original D-K graph doesn't support that the widely discussed "effect" which their conclusion describes exists. Therefore unless there is more evidence from some other subsequent study there may not be any evidence for it at all, and if that's the case then there's potentially no proof it exists.

However, even if you prove that their data is not evidence, that doesn't actually say anything about whether the effect exists or not, just that the D-K paper isn't evidence of such an effect.

I don't think that's enough for the author to conclude that "DK is autocorrelation". A more careful conclusion would be that "the DK data do not support DK's conclusion"... but of course that's much less likely to attract click throughs.

[+] uoaei|3 years ago|reply

> the intuitive null hypothesis would be that people are good at estimating their skill, therefore there would be strong a correlation between their performance and self-evaluation of said performance

That's not really the intended interpretation of "null" in "null hypothesis". "Null" does not mean "contrary to the effect you're testing". "Null" means "do not assume dependencies anywhere" and so your description is backwards.

[+] bitshiftfaced|3 years ago|reply

Check out Nuhfer et al 2016, who had a different explanation for why Dunning Kruger wasn't true.

Dunning Kruger effect: lower performers overestimate their ability, and higher performers underestimate their ability.

How did they find that? They asked participants to take a test and then had them do a self-assessment. Both were standardized from 0-100. They rated a participant's self-assessment accuracy by "self-assessment minus test score."

What's wrong with that method? You can't arrogantly self-assess as though you got a 130, and you can't humbly say that you got -50. Because of the standardization, you're bound by 0 and 100. This method makes it almost impossible for higher performers to overestimate their ability and for lower performers to underestimate.

What they actually found was that higher performers tend to be better at self-assessment. Lower performers are less accurate, but in both directions (not just overconfident).

[+] krick|3 years ago|reply

You've been corrected about "what DK means" in the other comments, but this is not quite the point of the post. This is not about if DK (as expressed in English words) is true or not — in fact, author points out in the beginning that it's one of these "everybody knows it's like that" ideas (as it often is with social psychology).

The point is, that the original DK paper is bullshit. At least, this plot is. And people tend to miss it, until they start to carefully read the labels and think about the caveats. In fact, as presented here it looks like it shouldn't even be accepted as a valid study, this is outright deceptive, maliciously so. If there is assumed to be a correlation between x & y, how about we start by plotting x against y then? I know, it may be messy. It almost certainly will be. Because of that, I personally won't even be offended (but some people might) by you removing the outliers and producing the unnaturally clean version of the plot in the end to highlight the main idea. Then some statistical tests to make the results quantified. But here we see nothing, it really is just comparing x to x.

IMO, this is pretty much the invariant of most of the problems of academic research in the last God-knows-how-many decades (maybe always was, I don't know). Computer science papers without the code. Data science papers without the data. Yeah-yeah, I've heard hundreds of excuses why researchers do it like that. But it's pointless, such "research" shouldn't be accepted by anybody. Either you make your findings actually public by providing everything to replicate every single step of your study (which is supposed to be the point), or you just don't publish anything and keep the research proprietary (I mean, obviously it's never black and white, there always will be concerns about test-subject anonymity, etc. — but it's ridiculous to discuss that when the accepted standard even in "proper" sciences are 20 pages of dense text which might never even get to the point of the study, i.e., actually showing the data to any extent.)

[+] pdonis|3 years ago|reply

> I really don't see how it concludes that the DK effect is wrong based on the analysis.

Neither do I. Basically what the article actually shows is that these two statements are equivalent:

(1) People with low test scores tend to overpredict their test scores, while people with high test scores tend to underpredict their test scores.

(2) People's predictions of their test scores are uncorrelated (or more precisely very weakly correlated [1]) with their actual test scores.

This is not a statement that the D-K effect is wrong. It's just restating what the D-K effect is in different words. All the talk about "autocorrelation" is just another way of saying that, if people's predictions of their test scores are only weakly correlated with their test scores, then people with low test scores will have to overpredict their test scores (because there's virtually no room to underpredict them--there's a minimum possible test score and their actual score is already close to it), and people with high test scores will have to underpredict them (because there's virtually no room to overpredict them--there's a maximum possible test score and their actual score is already close to it). But the real question is: why are x and y so weakly correlated? Why are people's predictions of their test scores so weakly correlated with their actual test scores? That is not what one would intuitively expect. That is the question the D-K effect raises, and the author not only doesn't answer it, he doesn't even see it.

Also, this statement in the description of the Nuhfer research doesn't make sense:

"What’s important here is that people’s ‘skill’ is measured independently from their test performance and self assessment."

Um, the test performance is the people's "skill". And in the original D-K research, it was "measured independently" from the people's self-assessment (their prediction of their test performance).

[1] Notice that in the "uncorrelated data" graph, Figure 10, the red line is basically horizontal. That's what you get when x and y are uncorrelated. But in the original D-K graph, Figure 2, the thick black line is not horizontal--it slopes upward. That's what you get when x and y are weakly correlated. If the author had put in a weak correlation between x and y in his own experiment, he would have gotten a graph that looked like Figure 2. But of course that still would do nothing to explain why x and y are so weakly correlated, which is the actual question.

[+] msrenee|3 years ago|reply

Please feel free to tell me why my interpretation is wrong. I understand stats just well enough to get myself in trouble.

The line for actual ability is basically x=y. If you scored 10%, you're in the bottom quartile. If you scored 100%, you're in the top quartile. That line isn't really data, just something for comparison. The perceived ability line is the one that utilizes the data. It seems to show that once you average out what everyone rated themselves, it ends up kind of in the middle between ~55-70%. So the people who scored 10% assumed, on average, they would score around 55%. The people who scored 100% assumed, on average, that they would score about 75%. That makes the average expected score much higher than the actual score on the low end and somewhat lower than the actual score on the high end.

I'd interpret this as the bottom quartile thinks they're average and the top quartile thinks they're a bit above average. So basically everyone thinks they're average-ish, but the people who did worst on the test were the most wrong about that. But then again, I can't remember what the questions on the test were even about and just the single graph isn't terribly useful to argue over because it's missing all of the context of the paper.

Now that I've sat and interpreted the graph using my own set of notions about what the numbers mean and what the graph actually shows, I feel like this ought to be used as one of those life lessons about how quotes and diagrams outside of their context within a paper are the epitome of the phrase "lies, damn lies, and statistics." Statistics aren't always lies, but they're incredibly easy to bend to your own biases and assumptions.

[+] omnicognate|3 years ago|reply

The fact that the statistical artifact is seen in completely uncorrelated data is only shown as a demonstration that it is not itself evidence of the claimed effect. To gather evidence that the effect doesn't actually exist you need a new experiment, not just a new analysis of the same data, because the the different levels of actual skill need to be established separately from establishing the error in skill self-assessment. In the original experiment the test results were used to establish both, which is not sufficient.

But the article presents the results from just such a new experiment. In this one they used university education level (sophomore through to professor) and measured skill self-assessment level within those groups. Higher education level (a good proxy for skill on the test used, which was about science literacy) was found to be associated with more accurate skill self-assessment but the bias of lower-skilled people overestimating their skills was not observed.

That's just one study of course, but it sounds like a much better designed one than the original and does constitute actual evidence that the Dunning-Kruger effect doesn't exist.

[+] ohwellhere|3 years ago|reply

It depends on what one means by the "Dunning-Kruger effect."

I had the same impulse that the analysis did not disprove DK, but after sitting with it for overlong I agree with the analysis.

I think there are two competing DK effect definitions that are being conflated, one descriptive and one explanatory:

1. DK shows that less skilled people overestimate their ability, and highly skilled people underestimate it

2. DK shows that people's estimation of their ability is causally determined by their actual ability

I believe you are claiming, correctly, that the article does not disprove the first definition that explains the observation, but I think the article is trying to disprove the second definition that explains why it occurs.

In other words: Yes, there is an observable Dunning-Kruger effect in the sense that we're bad at self evaluation. Is that effect attributable to one's actual level of competence? The evidence for that appears to be a statistical artifact, and further experiments seem to disprove that conjecture.

I'm not a statistician or a psychologist.

[+] longtimegoogler|3 years ago|reply

Yup. THat's what I was going to say. There data suggests that everyone kind of estimates their ability similarly so that more skilled people underestimate there ability (impostor's syndrome) and less skilled people overestimate the their abilities.

[+] a-dub|3 years ago|reply

i immediately jump to think that replacing all the data with noise is a pretty good null hypothesis (at least for the analysis). is that not true?

[+] diwank|3 years ago|reply

Excerpt from a newer paper by Nuhfer (2017) adds more clarity:

“… Our data show that peoples' self-assessments of competence, in general, reflect a genuine competence that they can demonstrate. That finding contradicts the current consensus about the nature of self-assessment. Our results further confirm that experts are more proficient in self-assessing their abilities than novices and that women, in general, self-assess more accurately than men. The validity of interpretations of data depends strongly upon how carefully the researchers consider the numeracy that underlies graphical presentations and conclusions. Our results indicate that carefully measured self-assessments provide valid, measurable and valuable information about proficiency. …”

https://www.researchgate.net/publication/312107583_How_Rando...

[+] TimPC|3 years ago|reply

The article is correct. The effect is statistical not psychological. It emerges even from artificial data and occurs independently of the supposed psychological justifications even for data where those justifications are clearly removed.

If you adjust the experiment design to avoid introducing the auto-correlation you get data that doesn't show the DK effect at all. Some might take issue with the adjusted experiment as using seniority related categories like "sophomore" and "junior" as skill levels has its own issues. To show the DK effect is real you need to come up with a better adjusted experiment that avoids the autocorrelation while still generating data that generates the effect. It's unclear if that's possible.

[+] hungrygs|3 years ago|reply

Just anecdotal, but my life observation of DK is often highly intelligent and competent people in a particular field who then generalize that to pontificate and proclaim, directly or indirectly, superior understanding to certified domain experts (e.g., have directly related advanced degree(s), work in the field for decades.) It thus seems more or as much a psychological effect - in short, people with a personality type of superiority and know-it-all, yet have never done the deep and hard work to gain or demonstrate any competency in said areas. A common side observation is of course unfounded conspiracy theories, that the derided experts have sinister intentions.

[+] knorker|3 years ago|reply

> The effect is statistical not psychological.

It is, though. This article says that if people are bad at estimating their skill (towards randomness / the midpoint), then bad people will overestimate, and good people will underestimate.

The psychological part is that people indeed will assume they're closer to the mean than they actually are. DK effect would not be seen if people correctly estimated their skill, nor if experts overestimated, nor if the incompetent underestimated.

[+] ImaCake|3 years ago|reply

Yes you are right and I don't understand how so many commenters in this thread can be so confidently state that the article is wrong. I just went ahead and did the random simulation myself and you get the "Dunning Kruger effect" which is exactly what the author paints it as: autocorrelation.

>If you adjust the experiment design to avoid introducing the auto-correlation you get data that doesn't show the DK effect at all.

This is even in the article! Yet some people are making claims against this despite references to the contrary.

[+] unknown|3 years ago|reply

[deleted]

[+] PaulKeeble|3 years ago|reply

Modern Psychology is having a lot of these sorts of results over the last decade, none of their methods are holding up under proper scrutiny. They are struggling to reproduce findings but more critically even the reproduced ones are turning out to be statistical and mathematical errors like shown here. Some of the findings have also done severe harm to patients over the decades as well, I can't help but think we need a lot of caution when it comes to psychology results given its harmful uses (such as the abuse of ill patients) and its lack of truthful results.

[+] Phileosopher|3 years ago|reply

I'm convinced it's associated with the methodology of how psychology has approached matters.

In the world of biology, you're observing the world around you. Same for physics, chemistry, et al. This means that you can set up proper controls to obscure your own presence from any potential results (e.g., isolate everything in another room, use cameras to avoid being near animals, etc.)

Psychology has the same nightmare as quantum physics: pre-existing thoughts and beliefs literally define what results you end up with.

I'm convinced that psych is a victim of the "new" way of doing science: treating the Scientific Method™ as a self-evident concept instead of regarding science as a vastly certain domain of metaphysics.

[+] aidenn0|3 years ago|reply

So many psychology experiments are so fantastically underpowered, that if the effect they are attempting to measure were real, odds are that it would have to be of such a large magnitude that it would completely overturn all of our beliefs on how humans function. And they can publish with a P value of 0.049.

So, implicit in the standards for publishing new research in psychology is "We think there is much greater than a 5% chance that our entire field is wrong" which is not a great place to start from.

[+] roguecoder|3 years ago|reply

Maybe asking 45 Cornell undergraduates and then generalizing that to "people" was a bad idea after all.

[+] Dave_Rosenthal|3 years ago|reply

This was interesting to me so I spent a while this AM playing with a Python simulation of this effect. I used a simple process model of a normally-distributed underlying 'true skill' for participants, a test with questions of varying difficulty, some random noise in assessing whether the person would get the question right, noise in people's assessments of their own ability, etc.

I fiddled with number of test questions, amounts of variation in question difficulty, various coefficients, etc.

In none of my experiments did I add a bias on the skill axis.

My conclusion is that the "slope < 1" part of the DK effect (from their original graph) is very easy to reproduce as an artifact of the methodology. I could reproduce the rough slope of the DK quartiles graph with a variety of reasonable assumptions. (One simple intuition is that there is noise in the system but people are forced to estimate their percentiles between 0 and 100, meaning that it's impossible for the actual lowest-skill person to underestimate their skill. There are probably other effects too.)

However, I didn't find an easy way using my simulation to reproduce the "intercept is high" part of the DK effect to the extent present in the DK graphs, i.e. where the lowest quartile's average self-estimated percentile is >55%. (*)

However, it strikes me that without a very careful explanation to the test subjects of exactly how their peer group was selected, it's easy to imagine everyone being wrong in the same direction.

(*) EDIT: I found a way to raise the intercept quite a lot simply by modeling that people with lower skill have higher variance (but no bias!) in their own skill estimation. This model is supported by another paper the article references.

[+] SomewhatLikely|3 years ago|reply

Wouldn't variance be influenced by a similar bounding effect but this time from the upper side? That is, if your true skill is 98% you aren't going to ever overestimate by more than 2%, but if your true skill is 50% you could be off by up to 50% in either direction.

[+] askasp|3 years ago|reply

If we assume random data then the people at the lower end will over-estimate their own performance the same amount that people on the higher end will under-estimate theirs.

However, if the under-performers consistently over-estimate more than the over-performers under-estimate there is still some merit to the effect, isn't there?

That is, the interesting number is the difference between integral of y-x on lower half vs the integral of y-x on the upper half. Does that make sense to anyone else?

[+] m3047|3 years ago|reply

Yeah, I think so and we're probably in the minority here. There are a couple of other comments referring to regression to the mean and that the article takes a literalist view which is perhaps unwarranted. You win the followup comment. ;-)

I confess that I've never paid that much attention to the classic D-K graph, and that taking a close look at it, it is most assuredly crap. Now I want to know what the plots of the actual scores for those quartiles look like rather than %ile, or after-the-fact ranking. Yeah, it sure looks like people mostly figure they're in the 55-75 %ile ranking, if that's what that actually is, and that where in that spread they think they are correlates with their actual ranking.

Let's go down a Bayesian rabbit hole. Let's assume, as does the article, that people's self estimations are completely random rubbish: the worst people have nowhere to go but up, the best nowhere but down. Yup, completely agree.

Now let me ask a question: is self-estimation of any use in determining actual ability? The answer in this case is no: knowing one does not inform our ability to know the other in a Bayesian sense, they are not correlated.

D-K sounds valuable as a cautionary tale concerning excessive exuberance and a tendency not to learn well from experience, but aside from child-proof caps and Mr. Yuk stickers where we really want to apply the lesson is at the high-performing end of the scale and here we get into trouble immediately.

It is tempting to say "high-performers have nowhere to go but down" as though maybe we should reject those self-reporting the best performance. The classic chart hints at high performers underestimating their true performance, but it's a crappy chart; maybe they want it to be true.

But in the specific case where there is utterly no correlation and true performance is as evenly distributed as self-assessment, if we chop off the "top X self-reporting" we will chop off just as many poor performers as high performers. Yes, I hear you, and I agree, random is an edge case; I just don't believe that affects its prevalence.

Maybe it is true; alright dust off those priors and have at it.

[+] jakear|3 years ago|reply

Article seems to be saying “DK doesn’t exist because it always exists”. Which is… absurd?

The point of DK is that when you don’t know shit, any non-degenerate self assessment will result in overestimating your ability. In short, “there are more natural numbers above smaller natural numbers than bigger ones”. This doesn’t have to do with psychology, and it’s expected that it appears when evaluating random data. That’s a good thing! It means DK exists even when us pesky humans aren’t involved at all, not that DK doesn’t exist at all.

[+] fallingfrog|3 years ago|reply

OK, I think I understand. What the data from the original experiment actually shows is that people at all skill levels are pretty bad at estimating their skill level- it's just that if you scored well, the errors are likely to be underestimates, and if you scored badly, the errors are likely to be overestimates, by pure chance alone. So it's not that low scoring individuals are particularly overconfident so much as everyone is imperfect at guessing how well they did. Great observation.

[+] sfvisser|3 years ago|reply

My intuition for this is: given a fixed and known scoring range (say 0..100), when scoring very low there is simply a lot of room for overestimating yourself and when scoring very high there is simply a lot of room for underestimating yourself. So all noise ends up adding to the inverse correlation naturally.

[+] ithkuil|3 years ago|reply

In other words people are quite bad at estimating their skill level. Some people will overestimate, while some other people will underestimate and on average there will be a relatively constant estimated skill level that doesn't change all that much based on the actual abilities.

Given that fact, it logically follows that people who score low ability tests will more often than not have overestimated their ability (and the same on the other end of the spectrum).

You can frame this effect as autocorrelation if you wish or just as a logical consequence. But that's missing the point.

The point is: why on earth are humans so bad at estimating their own competence level as to make it practically indistinguishable from random guesses.

[+] d0mine|3 years ago|reply

- what DK claims: there is bias (incompetent people overestimate their ability) - what data actually shows: there is a greater variance (incompetent people both over and _under_ estimate to a larger degree compared with more competent people. Data shows heteroscedasticity. No bias (estimations are around zero +/-, tighter for more competent).

[+] kizer|3 years ago|reply

That’s what I was thinking; if the average is about constant all you’ve shown is that everyone is bad at self-assessment (another issue - not fully qualifying a distribution by just using the average loses information).

But a comment above quoting the more recent paper presents a contradictory conclusion: that humans can self-assess with some accuracy. So now I’m confused again.

[+] orf|3 years ago|reply

> To measure ‘skill’, Nuhfer groups individuals by their education level…

I’m surprised this wasn’t flagged as something pretty silly.

[+] oceliker|3 years ago|reply

I think the gist of the article is this:

Suppose you make 1000 people take a test. Suppose all 1000 of these people are utterly incapable of evaluating themselves, so they just estimate their grade as a uniform random variable between 0-100, with an average of 50.

You plot the grades of each of the 4 quartiles and it shows a linear increase as expected. Let's say the bottom quartile had an average of 20, and the top had 80. But the average of estimated grades for each quartile is 50. Therefore, people who didn't do well ended up overestimating their score, while people who did well underestimated it.

In reality, nobody had any clue how to estimate their own success. Yet we see the Dunning-Kruger effect in the plot.

[+] playpause|3 years ago|reply

I’ve always felt the DK effect is cynical pseudoscience for midwit egos. It’s a sophistic statement of the obvious dressed up as an insight. But worse, it serves to obvert something interesting and beautiful about humans - that even very intellectually challenged people sometimes can, over time, develop behaviours and strategies that nobody else would have thought of, and form a kind of background awareness of their shortcomings even if they aren’t equipped to verbalise them, allowing them to manage their differences and rise to challenges and social responsibilities that were assumed to be beyond their potential. Forrest Gump springs to mind as an albeit fictional example of the phenomenon I’m talking about. I think this is a far more interesting area than the vapid tautology known as the DK effect.

[+] jollybean|3 years ago|reply

I think there's an easier explanation for the effect, and that is people are just not very good at judging their skill level, and due to reversion to the mean, low-performers probably overestimate and high-performers underestimate.

And also, I think there is actually a tiny bit of DK going on.

And then, as you say, it gets amplified by the pseudo-literati.

[+] highfrequency|3 years ago|reply

The author is onto something that Dunning-Kruger is suspicious, but the argument is wrong. The "statistical noise" plot actually demonstrates a very noteworthy conclusion: that Usain Bolt estimates his own 100m ability as the same as a random child's. This would be a great demonstration of the Dunning-Kruger effect, not a counterargument.

On the other hand, regression to the mean rather than autocorrelation does explain how you could get a spurious Dunning-Kruger effect. Say that 100 people all have some true skill level, and all undergo an assessment. Each person's score will be equal to their true skill level plus some random noise based on how they were performing that day or how the assessment's questions matched their knowledge. There will be a statistical effect where the people who did the worst on the test tend to be people with the most negative idiosyncratic noise term. Even if they have perfect self-knowledge about their true skill, they will tend to overestimate their score on this specific assessment.

Regression to the mean has broad relevance, and explains things like why we tend to be disappointed by the sequel to a great novel.

[+] roguecoder|3 years ago|reply

The point is that if people estimate their abilities at random, with no information, it will look like people who perform worse over-estimate their performance. But it isn't because people who are bad at a thing are any worse at estimating their performance than people who are good at the thing: they are both potentially equally bad at estimating their performance, and then one group got lucky and the other didn't.

It would require them to be _even worse that random_ for them to be worse at estimating their abilities, rather than simply being judged for being bad at the task. It is only human attribution bias that leads us to assume that people should already know whether they are good or bad at a task without needing to being told.

The study assumed that the results on the task are non-random, performance is objective, and that people should reasonably have been expected to have updated their uniform Bayesian priors before the study began.

If any of those are not true, we would still see the same correlation, but it wouldn't mean anything except that people shared a reasonable prior about their likely performance on the task.

People will nevertheless attribute "accurate" estimates to some kind of skill or ability, when the only thing that happened is that you lucked into scoring an average score. You could ask people how well they would do at predicting a coin flip and after the fact it would look like whoever guessed wrong over-estimated their "ability" and a person who guessed right under-estimated theirs, even though they were both exactly accurate.

This comment section clearly demonstrates the attribution bias that makes this myth appealing, though. And this blog post demonstrates how difficult it is to effectively explain the implications of Bayesian reasoning without using the concept.

[+] cryptica|3 years ago|reply

I've felt inadequate throughout most of my early career. That's how I know that the confidence I have today is well deserved.

I've never had impostor syndrome though. To have impostor syndrome, you have to be given opportunities which are significantly above what you deserve.

I did get a few opportunities in my early career which were slightly above my capabilities but not enough to make me feel like an impostor. In the past few years, all opportunities I've been given have been below my capabilities. I know based on feedback from colleagues and others.

For example, when I apply for jobs, employers often ask me "You've worked on all these amazing, challenging projects, why do you want to work on our boring project?" It's difficult to explain to them that I just need the money... They must think that with a resume like mine I should be in very high demand or a millionaire who doesn't need to work.

I've worked for a successful e-learning startup, launched successful open source projects, worked for a YC-backed company, worked on a successful blockchain project. My resume looks excellent but it doesn't translate to opportunities for some reason.

[+] oh_my_goodness|3 years ago|reply

Dunning and Kruger showed that students all thought they were in roughly the 70th percentile, regardless of where they actually ranked. That's it. The plots in the original paper make that point very clear.

It is unnecessary to walk the reader through autocorrelation in order to achieve a poorer understanding of that simple result.

[+] IncRnd|3 years ago|reply

》It’s the (apparent) tendency for unskilled people to overestimate their competence.

Close. It's the cognitive bias where unskilled people greatly overestimate their own knowledge or competence in that domain relative to objective criteria or to the performance of their peers or of people in general.

[+] jl2718|3 years ago|reply

So, they observe a bias toward the average, and the dependence goes exactly as one would naively expect. If scientists exist to explain things we find interesting, statisticians exist to make those things boring. Seriously, work as a data scientist and you end up busting hopes and dreams as a regular part of your job. Almost everything turns out to be mostly randomness. The famous introduction to a statistical mechanics textbook had me pondering this. If life really is just randomness, it’s hard to find motivation. From a different viewpoint, however, I’ve found that the people that embrace this concept by not trying to control things too much, actually end up with the most enviable results, although I may be guilty of selection bias in that sample.

[+] apienx|3 years ago|reply

> Collectively, the three critique papers have about 90 times fewer citations than the original Dunning-Kruger article.5 So it appears that most scientists still think that the Dunning-Kruger effect is a robust aspect of human psychology.6

Critiques cite the work being critiqued (yes, the referenced critiques in TFA cite the Dunning-Kruger study). Also, a 23 year-old paper will inevitably get cited more than 6 year-old papers. But yeah...the inertia in Science is real. That conservatism's a feature, not a bug.

Psychology's probably the discipline with the shortest "half-life of knowledge. https://en.wikipedia.org/wiki/Half-life_of_knowledge

[+] nomilk|3 years ago|reply

Thanks for introducing me to the term "half-life of knowledge".

> An engineering degree went from having a half life of 35 years in ca. 1930 to about 10 years in 1960. A Delphi Poll showed that the half life of psychology as measured in 2016 ranged from 3.3 to 19 years depending on the specialty, with an average of a little over 7 years.

This is very interesting and makes me wonder what it is for tech careers, e.g. web devs, data scientists etc.

195 comments