> Why so angry?
I know I’ve taken this far too personally. I have no illusions that everything I read online should be correct, or about people’s susceptibility to a strong rhetoric cleverly bashing conventional science, even in great communities such as HN. But frankly, for the last few years, the world seems to be accelerating the rate at which it’s going crazy, and it feels to me a lot of that is related to people’s distrust in science (and statistics in particular). Something about the way the author conveniently swapped “purely random” with “null hypothesis” (when it’s inappropriate!) and happily went on to call the authors “unskilled and unaware of it”, and about the ease with which people jumped on to the “lies, damned lies, statistics” wagon but were very stubborn about getting off, got to me. Deeply. I couldn’t let this go.
I am afraid I actually agree with the author's point. The anti-intellectual, anti-scientific streak in many poor analyses claiming to debunk some scientific research is deeply concerning in our society. If someone is trying to debunk some scientific research, at least he should learn some basic analytic tools. This observation is independent of whether the original DK paper could have been better.
That said, I give the benefit of doubt to the author of "The DK Effect is Autocorrelation." It is a human error to be overly zealous in some opinions without thinking it through.
Let's not forget though that a great deal of "science" is in fact trash[1]. The problem isn't really people being anti-science or pro-science. The problem is science being done poorly, whether by scientists in the credentialed sense, or amateurs.
There is no pat "trust science more" or "trust amateurs less" answer here. The actual answer is that if you want to understand research, you need to actually understand mathematical statistics and the philosophy of statistics fairly deeply. There just isn't any way around it.
What about the replication crisis? It's possible to use rigorously sound statistics to lie (or at least unknowingly spread falsehoods). I can't tell you how many times I've seen headlines or abstracts of studies that seem to contradict ones I've seen previously, and back and forth! Particularly in the social sciences.
I recall one study that said all white people are committing environmental racism against all non-white people. I dove in and read the whole thing wondering what method could have yielded scientific confidence in such a broad result. Turns out the model used was a semi-black box that required a request for access and a supercomputer to run. But it was in a Peer Reviewed Scientific Journal and had lots of Graduate Level Statistics so I guess it seemed trustworthy.
> That said, I give the benefit of doubt to the author of "The DK Effect is Autocorrelation." It is a human error to be overly zealous in some opinions without thinking it through.
If only there were a term for "a cognitive bias whereby people with limited knowledge or competence in a given intellectual or social domain greatly overestimate their own knowledge or competence in that domain relative to objective criteria or to the performance of their peers or of people in general"
That happens when science is politicized, and any scientists critical of the “official” results is destroyed. From climate to Covid, so many areas where that happens.
It still seems to me like "The DK Effect is Autocorrelation" is basically correct. The important thing isn't whether or not independence should be the null hypothesis, because calling something a "null hypothesis" is just an arbitrary label that doesn't affect reality. The important thing is that what we can actually conclude from the Dunning-Kruger paper is a lot less than popular presentations of the concept claim. In particular, "more skilled people are better at predicting their own performance" is really not supported by the paper, since that's not true of random data, which has everyone being equally terrible at predicting their own performance. If the random data can reproduce that graph, then the graph can't be proof that more skilled people are also better predictors.
Anyway, "The DK Effect is Autocorrelation" definitely seems to be both statistically literate, and a good faith criticism of the Dunning-Kruger paper. In light of that, calling it "anti-scientific" seems unfair, since criticism and debate are an important part of science.
I think what contributes to this phenomenon are both second-option bias[1] and motivated reasoning, at least with respect to those who choose to believe in the poor analyses.
I read a lot of papers on behavioural economics and psychological decision making experiments for university, like dunning-kruger, kahneman, etc and in my opinion the first autocorrelation article reads like a rebuttal paper but more informal, the approach is scientific even if it may be flawed. This is how knowledge advances. I disagree that it is anti-science. Challenging accepted postulations is good. Even famous professors make mistakes, I don't blame the writer for making an honest mistake. That's how we got this new piece of writing
Behavioural science is a pretty new field, its pretty easy to get abberant results or manipulate the results to show 'something' statistically. Many findings in earlier papers could not be replicated, or had applied statistics incorrectly, or showed different results when research participants were not white college kids.
This is a whole other problem within academia, the pressure to publish something even when there is nothing and perceived legitimacy based on the number of citations a paper has. My professor always said don't look at the number of citations, understand the method and the rebuttal, there were numerous low citation but solid papers showing flaws in famous ones but everyone who isn't deep into the subject holds the original assertion to be legitimate because its "famous"
Most social science is shoddy, fake, or otherwise misleading (i.e. it proves nothing meaningful despite the claims of the researchers). If you believed every social science study you heard about, you'd be more wrong about the world than if you disbelieved them all.
> The anti-intellectual, anti-scientific streak in many poor analyses claiming to debunk some scientific research is deeply concerning in our society.
People endlessly reference the Dunning-Kruger effect as a meme, without ever having read the paper, let alone having checked its methods. You don't seem to have a problem with that.
On the other hand, after seeing an article that uses essentially statistical arguments to debate a scientific study you conclude that there is some "anti-intellectual, anti-scientific streak" in our society and that it should be of grave concern.
This doesn't make any sense except as an extreme case of virtue-signaling.
Read the actual paper [1], there is so much more than those charts. They ask for an assessment of the own test score and an assessment of the ranking among the other participants to distinguish between misjudgments of the own abilities and the abilities of others. They give participants access to the tests of other participants and check how this affects self assessments - competent participants realize that they have overestimated the performance of other participants and now assess their own performance as better than before, incompetent participants do not learn from this and also assess their performance even better than before. They randomly split participants into two groups after a test, give one group additional training on the test task, and then ask all of them to reconsider their self assessments - incompetent participants that received additional training are now more competent and their self assessment becomes more accurate. This is not everything from the paper and probably also somewhat oversimplified, I just want to provide a better idea of what is actually in there.
Everyone is free to question the results, but after actually reading the entire paper I can confidently say that poking a bit at the correlation in the charts falls way short of undermining the actual findings from the paper. The actual results are much more detailed and nuanced than two straight lines at an angle.
I think if you wanted to poke holes in the paper you'd start with the generic issues that are typical to much psychological research:
1. It uses a tiny sample size.
2. It assumes American psych undergrads are representative of the entire human race.
3. It uses stupid and incredibly subjective tests, then combines that with cherry picking:
"Thus, in Study 1 we presented participants with a series of jokes and asked them to rate the humor of each one. We then compared their ratings with those provided by a panel of experts, namely, professional comedians who make their living by recognizing what is funny and reporting it to their audiences. By comparing each participant's ratings with those of our expert panel, we could roughly assess participants' ability to spot humor ... we wanted to discover whether those who did poorly on our measure would recognize the low quality of their performance. Would they recognize it or would they be unaware?"
In other words, if you like the same humor as professors and their hand-picked "joke experts" then you will be assessed as "competent". If you don't, then you will be assessed as "incompetent".
Of course, we can already guess what happened next - their hand picked experts didn't agree on which of their hand picked jokes were funny. No problem. Rather than realize this is evidence their study design is maybe not reliable they just tossed the outliers:
"Although the ratings provided by the eight comedians were moderately reliable (a = .72), an
analysis of interrater correlations found that one (and only one) comedian's ratings failed to correlate positively with the others (mean r = -.09). We thus excluded this comedian's ratings in our calculation of the humor value of each joke"
The fact that this actually made it into their study at all, that peer reviewers didn't immediately reject it, and that the Dunning-Krueger effect became famous, is a great example of why people don't or shouldn't take the social sciences seriously.
The plot to me always read "People estimate themselves at 60-70% percentile - above average, but not the best". And then given this broad prior, people do place themselves accurately(because the plot is increasing).
So it seems people are bad at doing global rankings. If I tried to rank myself amongst all programmers worldwide, that seems really hard and I could see myself picking some "safe" above-average value just because I don't know that many other people.
There's also: If you take 1 class in piano 30 years ago and can only play 1 simple song, that might put you in the 90th percentile worldwide just because most people can't play at all. But you might be at the 10th percentile amongst people who've taken at least 1 class. So doing a global ranking can be very difficult if you aren't exactly sure what the denominator set looks like.
So I think it's an artifact of using "ranking" as an axis. If the metric was, "predict the percentage of questions you got correct" vs. "predict your ranking", maybe people would be more accurate because it wouldn't involve estimating the denominator set.
This is exactly my conclusion, and it seems obvious... just look at the self assessment line - pretty much everyone thinks they are slightly above average. Once you know that everyone thinks they are above average, you already know how it will play out... the bottom quartile will have the biggest gap between actual skill and estimated skill.
> There's also: If you take 1 class in piano 30 years ago and can only play 1 simple song, that might put you in the 90th percentile worldwide just because most people can't play at all. But you might be at the 10th percentile amongst people who've taken at least 1 class. So doing a global ranking can be very difficult if you aren't exactly sure what the denominator set looks like.
Yes, and this literally implies that people in the lowest quartiles can't and won't rate themselves to be in the lowest quartiles when they are forced to give an answer. (Especially on tests that doesn't measure anything (getting jokes? really?), on tests that they have no knowledge about (how would they know that how their classmates perform on an IQ test???), or on tests that just have a high variance.)
And therefore they will "overestimate their performance".
It's like grouping a bunch of random people, and forcing them to answer whether their house is short, average or high. The "people living in short houses" will "overestimate the height of their houses", while the "people living in towers" will humbly say they live in an average high house.
Is this an existing and relevant psychological phenomenon, different from the general inability to guess unknown things? I don't think so.
I think Dunning Krueger makes intuitive sense. When you become skilled in your field you learn from other people in your field, and your assessment of yourself is based on your relation to the skills of those other people. But if you know very little about something, you have no reference point to evaluate yourself against.
When you learn something you also learn what are some of the mistakes you can make. You evaluate your performance then against the mistakes you didn't make. Consider a piano player, or figure-skater. You have to know about what figures are difficult to perform to evaluate a performance, and you don't know what the difficult ones are until you have studied and tried to perform them.
It’s been argued before that this is the only reason that DK gained any notoriety; because it feels right, not because it is right. It’s the “just-world” theory: we want to believe that confident people are overcompensating.
Is it actually intuitive though? Consider your own example. Most people who don’t know piano or figure skating are well aware that they don’t know, and do not rate themselves highly at all. Would it be surprising to learn that people who don’t know any law or engineering don’t often hold any doubts about their lack of skill, and by and large are not deluded nor erroneously believe they’re great at these things they don’t know?
The DK paper didn’t measure knowledge-based skills like piano, figure skating, or law. It measured things like the ability to get a joke, and conversational grammar. How would you rate your own ability to get a joke? (Does this question really even make a lot of sense?)
It’s important that the methods in the DK paper focused on tasks that are hard to self-evaluate, because when people have tried to replicate DK with more well defined knowledge-based activities, they have often demonstrated the complete opposite effect, that there is widespread impostor syndrome, and skilled people underestimate themselves.
It's also intuitive if you think about error in self assessment. Skill is asymptotic to some upper bound. The closer to the asymptote (higher skill), then most likely the error in estimation is under it, since it cannot be above it.
Conversely it cannot also be under zero, so error is most likely going to be above the actual skill line (over estimation, since it's clamped below it).
Most people can't play the piano or skate. Don't consider those ones. Consider the things that everyone can do. Let's pick driving a car. I am fairly convinced that most people feel after a few years they are excellent at driving their car but in fact they are just OK to terrible. And this is with a lot of practice!
I think that's assuming more ignorance than even an unskilled person has.
If you had never listened to a professional play piano before then you'd have no idea what level of performance is possible. Similarly, if you had never seen skilled skaters perform on TV.
But we have done these things, so it's obvious that they're doing something that's very difficult.
Maybe you don't fully appreciate the skill, though. You wouldn't do well as a judge who compares the performances of professionals. But comparing novices to professionals seems easy?
Your post reminded me of one of my favorite Adam Savage videos where he touches upon this idea you're exploring. I encourage folks to see it, he articulates it so well.
I linked to the start of the video where he begins to build the idea. TLDR is he mentions Monet painting Impression Sunrise and how it was something that people have never seen before and it took a bit of time for it to blow people away--they needed to develop "new eyes" to see the genius. Adam then dives into this idea of "new eyes". I'm sure many of us have experienced this in our life and it was so nice to hear Adam unpack it.
If human cultures can be characterized as default arrogant or default humble then it stands to reason that arrogant cultures will have a DK effect, and in humble cultures you won't.
Something I generally keep in mind about articles posted to HN:
A large portion of the HN audience really, really wants to think they're smarter than mostly everyone else, including most experts. Very few are. I'm certainly not.
Articles which "debunk" some commonly held belief, especially those wrapped in what appears to be an understandable, logical, followable argument, are going to be cat nip here.
Articles like this are even stronger cat nip. If a member of the HN audience wants to believe they're mostly smarter than mostly everyone else, that includes other members of the HN audience.
So, whenever I read an article and come away thinking that, having read the article, I'm suddenly smarter than a huge number of experts, especially if, like the original article, it's because I understand "this one simple trick!", I immediately discard that knowledge and forget I read it.
If the article is right, it will be debated and I'll see more articles about it, and it'll generate sufficient echoes in the right caves of the right experts. Once it does, I can change my view then.
I am not a statistician, or a research scientist. I have no idea which author is right. But, my spider sense says that if dozens of scientific papers, written by dozens of people who are, failed to notice their "effect" was just some mathematical oddity, that'd be pretty incredible.
And incredible things require incredible evidence. And a blog post rarely, if ever, meets that standard.
"The second option conforms with the Research Methods 101 rule-of-thumb “always assume independence.” Until proven otherwise, we should assume people have no ability to self-assess their performance"
It's not that at all. The assumption should be that everyone is equally good (or bad) at assessing their performance. Not that they have no ability but that the means between groups is the same vs. not the same. That the ability to assess themselves is independent of performance.
This confused me at first too. The issue is that "X" is your performance, and "Y" is your perceived performance.
Say that everyone is equally okay at assessing themselves, and get within 0.1 of their actual performance (rated from 0 to 1). Then X and Y are going to be very correlated, as X - 0.1 < Y < X + 0.1. But X-Y will look like a random plot, since Y is randomly sampled around X.
The only case where X and Y wouldn't correlate at all is if people have no ability to assess their performance (IE, Y isn't sampled around X, but is instead sampled from a fixed range).
The less you know, the more random your guess at your own knowledge is. The actual value is low and less than zero isn't an option, so this drags the average up consistently.
The more you know, the more accurate your guess of your knowledge is. Especially as you hit the limits of the test, this noise can only drag the average down, but less dramatically than the other case.
With the reasonable conclusion: We all suck at guessing how much we know, but the more you know the less you suck until you hit the limits of the framework you are using for quantization of knowledge.
I had the same thought while reading this. The test has a limited range of values, you can only estimate your score within that range, no higher or lower. Those at the top and bottom will naturally estimate into the body of the range since a lower or higher estimate is not possible. However, I’m not sure this explains the results entirely, and I’d like to see a statistician take this further.
Assuming a world where all the participants understand normal distributions, would this be addressed by asking people to rate how they did in terms of "standard deviations compared to the average" or such?
I’m not a statistician but I do have some basic training in psychometrics. It might be interesting/helpful to point out that your priors about self-assessment seem more reasonable generally but also put a lot of faith in the test’s validity as a measure of skill.
I’m relying on intuition here, but it seems a little problematic that the actual score and the predicted score are both bound to the same measurement scheme. Given that constraint on some level we’re not really talking about an external construct of skill, just test performance and whether people estimate it well. Which is different from estimating their skill well.
Maybe someone with more actual skill can elaborate or correct haha.
What’s more interesting to me is what all the buzz over DK tells me. We are asymmetrically skeptical. In the same way as intelligent people doubt their own performance, they rightly doubt others’ performance. Maybe too much.
I think that most people who talk a lot about DK believe that they are the experts in one field or another.
It serves mostly as a way of reassuring themselves of their own superiority. The message (for them) basically amounts to "other people's claim to knowledge is just further proof that they don't know anything."
It's called a giant fucking lack of self awareness, with a good helping of societally instilled narcissism on the worst side of it all, and then add in imposter syndrome, self righteousness and gaslighting. The best side of it all basically is all of these things, but with a tight leash on things and sans the gaslighting. There might be better, but those people are probably off doing their own thing minding their own business; etc.
We’ll done. I read the autocorrelation post when it came out a couple weeks back and it didn’t sit right with me. But I didn’t have the motivation to figure out why. Your explanation resonates perfectly with my initial (snap) intuition and I thank you for taking the time to write it out and post!
Gah, I wish I had time to fully read this and get into it, but I have to spend the next few hours driving.
Unfortunately the original article isn't very clearly explained, and it's only on reading the discussion in the comments under it that it becomes clear what it's actually saying.
The point is about signal & noise. Say your random variable X contains a signal component and a noise component, the former deterministic and the latter random. Say you correlate Y-X against X, and further say you use the same sample of X when computing Y-X as when measuring X. In this case your correlation will include the correlation of a single sample of the noise part of X with its own negation, yielding a spurious negative component that is unrelated to the signal but arises purely from the noise. The problem can be avoided by using a separate sample of X when computing Y-X.
The example in the original "DK is autocorrelation" article is an extreme illustration of this. Here, there is no signal at all and X is pure noise. Since the same sample of X is used a strong negative correlation is observed. The key point though is that if you use a separate sample of X that correlation disappears completely. I don't think people are realising that in the example given the random result X will yield another totally random value if sampled again. It's not a random result per person, it's a random result per testing of a person.
This is only one objection to the DK analysis, but it's a significant one AFAICS. It can be expected that any measurement of "skill" will involve a noise component. If you want to correlate two signals both mixed with the same noise sources you need to construct the experiment such that the noise is sampled separately in the two cases you're correlating.
Of course the extent to which this matters depends on the extent to which the measurement is noisy. Less noise should mean less contribution of this spurious autocorrelation to the overall correlation.
To give another ridiculous, extreme illustration: you could throw a die a thousand times and take each result and write it down twice. You could observe that (of course) the first copy of the value predicts the second copy perfectly. If instead you throw the die twice at each step of the experiment and write those separately sampled values down you will see no such relationship.
Hey omnicognate, good to see you here, appreciated our previous discussion.
What you're saying is that we need to verify the statistical reliability of the skill tests DK gave, and to some extent that we need to scrutinize the assumption that there indeed is such a thing as "skill" to be measured in the first place. I hope we can both agree that skill exists. That leaves the test reliability (technical term from statistics, not in the broad sense).
What's simulated by purely random numbers is tests with no reliability whatsoever. Of course if the tests DK gave to subjects don't actually measure anything at all, the DK study is meaningless. If that's what the original article's author is trying to say, they sure do it in a very roundabout way, not mentioning the test reliability at all. I'd be completely fine reading an article examining the reliability of the tests. Otherwise, I again fail to see how the random number analysis has anything to do with the conclusions of DK.
In fact, DK do concern themselves with the test reliability, at least to some extent. That doesn't appear in the graph under scrutiny but appears in the study.
If you assume the tests are reliable, and you also assume that DK are wrong in that people's self-assessment is highly correlated with their performance, and generate random data accordingly, you'll still get no effect even if you sample twice as you propose.
> The key point though is that if you use a separate sample of X that correlation disappears completely
Separate sample of X under the assumption of no dependence at all of the first sample, i.e., assuming there is no such a thing as skill, or assuming completely unreliable tests. So, not interesting assumptions, unless you want to call into question the test reliability, which neither you nor the author are directly doing.
Beyond the validity of the statistical methods used.. can someone clarify what is the actual hypothesis we are debating about competence? And what does each article propose?
My understanding is that the hypothesis is "Those who are incompetent overestimate themselves, and experts underestimate themselves".
DK says: True
DK is Autocorrelation says: ???
"I cant let go..." says: True?
HN says: also True?
Is there really any debate here? The "DK is Autocorrelation" article seems to be the only odd one out, and it's not clear if it even makes a proposal either way about the DK hypothesis. It talks about the Nuhfer study, but that seems Apples vs Oranges since it buckets by education level. Then it also points out that random noise would also yield the DK effect. But that also does not address the DK hypothesis, and it would indeed be very surprising if people's self evaluation was random!
So should my takeaway here just be that the DK hypothesis is True and that this is all arguing over details?
DK is Autocorrelation says: The DK article is based on a false premise, we got to disregard it
"I cant let go..." says: Actually, given that we assume people are somewhat capable of self-assessment, which is reasonable, "DK is Autocorrelation" is the one based on a false premise, and we should disregard that one instead, and not DK.
> My understanding is that the hypothesis is "Those who are incompetent overestimate themselves, and experts underestimate themselves".
The DK hypothesis is "double burden of the incompetent": "Because incompetent people are incompetent, they fail to comprehend their incompetence and therefore overestimate their abilities more than expertes underestimate theirs"
Arguably the hypothesis that matches the data from the DK paper best is: "Everyone thinks they're average regardless of skill level"
What I don't like about statistics, or rather the use if them, is the tendency to focus exclusively on them instead of treating them as the tool they are. Statistical analysis is not the subject of the DK effect or paper, it is a tool D & K used in analyzing the effect, nothing else. D&K did put more expertise, research and knowledge into their research than simple statistics.
I hate it when people are "solely2 using statistics, and other first-principle thinking approaches, to understand well researched and documented topics. And I hate it if people use solely statistics to criticize research without considering the other aspects of it. Does it mean the DK effect can be discarded or not? I don't know, I think some disagreement over the statistical methods is not enough to come to any conclusion.
Attacking the Dunning-Kruger study only on statistical grounds looks like aprime example of the DK effect in itself...
For anyone who is interested in playing around with these charts, the various assumptions that under pin them etc. I've thrown together a colab notebook as a starting point.
Observation: if you rank via true "skill" and assume for a particular instance the predicted performance and observed performance are independent but both have the true skill as their mean you dont observe the effect. CC of 0.00332755.
If you rank via observed performance and plot observed vs predicted the effect is there. CC of -0.38085757.
This is assuming very simple gaussian noise which is not going to be accurate especially as most of these tasks have normalised scores.
What your simulation includes and the original article didn't (and I didn't touch at all in my article) is the statistical reliability of the tests they administered. Where you got a CC of -0.38 you used equal reliability (/ unreliability) of the skill tests and self-assessments. You can see that as you increase the test reliability, the CC shrinks and the effect disappears.
I have no idea what's the actual reliability of the DK tests, they do seem to consider that but maybe not thoroughly enough. In my view it's very fair to criticize DK from that angle. But that would require looking at the actual tests and their data.
My point being, that any purely random analysis is based on assumptions that can easily be tweaked to show the same effect, the opposite effect, or no effect at all.
Would it be possible to understand the results differently? It looks to me that the data could be explained by the participants moderating their self assessment away from extremes or perhaps towards the population mean which is arguably not an unreasonable thing to do if your knowledge of the population mean is better than your knowledge of your own performance.
And this is why we need error bars on all plots. Looking at these plots there is no way to know whether people guessed uniformly or whether the self assessment is clustered around the mean.
Yeah I agree that's the likely explanation. Nobody wants to admit that they're terrible and nobody wants to boast that they're the best and be proven wrong.
So my suspicion is that the DK effect is not really a symptom of people's inability to accurately self-assess, but they're unwillingness to accurately report that self-assessment.
And I don't think it is unique to self assessment either. It's common knowledge that ratings on a scale out of 10 for pretty much everything are nearly always between 6 and 9.
I don't know how they did the experiment but I bet they'd get different results if the self-assessments were anonymous and accuracy came with a big financial reward.
Anyway that's all irrelevant to the point of the article which I think is correct.
I think the main point of this post is correct -- just because you can find the effect in random noise, doesn't mean it's not real phenomenon that happens in real life. But it's missing a nuance there: if an effect can be replicated with random noise, then it's not a psychological effect (e.g. something that you would explain as a human bias), but a statistical effect. E.g. regression towards the mean is a real effect, but it's a statistical effect, not a psychological effect.
And that's the point the original article was trying to make ("The reason turns out to be embarrassingly simple: the Dunning-Kruger effect has nothing to do with human psychology. It is a statistical artifact — a stunning example of autocorrelation."), though that point does lost a bit as it goes on.
> if an effect can be replicated with random noise, then it's not a psychological effect
This isn't true either. Statistical dependence does not determine or uniquely identify causal interpretation or system structure. See Judea Pearl's works (e.g. The Book of Why) for more on this.
People lacking the ability to self-assess is interesting psychologically. People can learn from experience in many other contexts. People can judge their relative position versus other people in many contexts. Why would they be so bad at this particular task? There could be a psychological underpinning.
Even if it turns out we have useless noise-emitting fluff in the place that would produce self-awareness of skill, that would be a psychological cause of a psychological effect. Not the ones that Dunning and Kruger believed they were seeing, but still.
Now, if you asked frogs for a self-assessment of skill, I would expect that data would not show any psychological effects.
To riff on one of the author's previous comments, if height was uncorrelated with age for 0-20 year olds, that would be very surprising, and hopefully we wouldn't need to make posts saying "the fact 20 year olds are just as likely to be 1 ft tall as 1 year olds is not a physical phenomenon, it's a statistical effect."
As a novice on DK, it seems to me that, for DK to be 'suprising' (in the parlance of the OP), four phenomena must hold:
1) an incompetent person is poorer than average at self assessment of their skill
2) as a person's competence increases at a skill, their ability to self-assess improves, until they become 'expert' which is defined by underappreciating their own skill (or overappreciating the skill of others)
3) DK is surprising (interesting) only when some incompetent persons who suffer from DK cannot improve their performance, presumably because their poor self-assessment prevents their learning from experience or from others.
4) Worse yet, some persons suffering from DK cannot improve their performance in numerous skill areas, presumably because their poor self-assessment is caused by a broad cognitive deficit (e.g. political bias), preventing them from improving on multiple fronts (which are probably related in some thematic way).
If DK is selective to include only one or two skill areas, as in case 3, that is not especially surprising, since most of us have skill deficits that we never surmount (e.g. bad at math, bad at drawing, etc). DK becomes surprising only in case 4, when we claim there is a select group of persons who have broad learning deficits, presumably rooted in poor assessment of self AND others — to wit, they cannot recognize the difference between good performance and bad, in themselves or others. Presumably they prefer delusion (possibly rooted in politics or gangsterism) to their acknowledgement of enumerable and measurable characteristics that separate superior from inferior performance, and that reflect hard work leading to the mastery of subtle technique.
If case 4 is what makes DK surprising, then DK certainly is not described well by the label 'autocorrelation' — which seems only to describe the growth process of a caterpillar as it matures into a butterfly.
>it seems to me that, for DK to be 'surprising' (in the parlance of the OP), four phenomena must hold:
The surprising things about DK, to me at any rate, is how unvarying it is in application. Under DK people who are poor at something never think wow I really suck at this, or if they do they are such a minuscule part of the population that we can discount them.
I've known lots of people who were not good at particular things and did not rate themselves as competent at it, although truth is they might have claimed competence if asked by someone they didn't want to be honest with.
On a pure human level, a large portion of DK discourse seems to be a fight over which people are the "Unskilled and Unaware." Or more bluntly, who gets to call who stupid.
The author says as much in this article:
> Why so angry? [...] [Frankly], for the last few years, the world seems to be accelerating the rate at which it’s going crazy, and it feels to me a lot of that is related to people’s distrust in science (and statistics in particular). Something about the way the author conveniently swapped “purely random” with “null hypothesis” (when it’s inappropriate!) and happily went on to call the authors “unskilled and unaware of it”, and about the ease with which people jumped on to the “lies, damned lies, statistics” wagon but were very stubborn about getting off, got to me. Deeply. I couldn’t let this go.
> In their seminal paper, Dunning and Kruger are the ones broadcasting their (statistical) incompetence by conflating autocorrelation for a psychological effect. In this light, the paper’s title may still be appropriate. It’s just that it was the authors (not the test subjects) who were ‘unskilled and unaware of it’.
But on some level, the original paper sounds just as condescending and dismissive. It presents a scholarly and statistical framework for looking down on "the incompetent" (a phrase used four times in the original paper). In practice, most of the times I see the DK effect cited, it functions as a highbrow and socially acceptable way of calling someone else stupid, in not so many words.
Cards on the table, I've never liked DK discourse for this reason. It's always easy to imagine others as the "Unskilled and Unaware", and for this reason bringing DK into any discussion rarely generates much insight.
> it functions as a highbrow and socially acceptable way of calling someone else stupid
I think it's even worse that that: it's also a socially acceptable way of enforcing credentialism and looking down on others for not having a sufficiently elite education.
When I saw the graphs in the original article I immediately came to a different conclusion - that people with a given amount of skill have low confidence in their ability to gauge how skilled they are compared to an arbitrary group.
For example, if someone gave me (or you) a leetcode-style test, and told me I'd be competing against a sample picked from the general population, and ask me how well I did, I'd probably rate myself near the top with high confidence.
Conversely, if my competitors were skilled competitive coders, I'd put myself near the bottom, again with high confidence.
Now, if I had to compete with a different group, say my college classmates, or fellow engineers from a different department, I'd be in trouble, if I scored high, what does that mean? Maybe others scored even higher. Or if I couldn't solve half of the problems, maybe others could solve even less - point is I don't know.
In that case the reasonable approach for me would be to assume I'm in the 50th percentile, then adjust it a bit based on my feelings - which is basically what happened in this scenario, and would produce the exact same graph if everyone behaved like that.
No need to tell tall tales of humble prodigies and boastful incompetents.
> Again, my main point is that there’s nothing inherently flawed with the analysis and plots presented in the original paper.
I find the use of quartiles suspicious, personally. It's very nearly the ecological fallacy[1].
> I’m not going to start reviewing and comparing signal-to-noise ratios in Dunning-Kruger replications
DK has been under fire for a while now, nearly as long as the paper has existed[2]. At present, I am in the "effect may be real but is not well supported by the original paper" camp. If DK wanted to they could release the original data, or otherwise encourage a replication.
Agree. From the DK article graph it is not possible to separate the cases
1. Average self assessment coincides with true skill, but variance increases with low skill.
2. Average self assessment is biased, and the bias is positive when you are unskilled and negative when you're highly skilled.
These two situations would create indistinguishable DK-graphs. I don't understand how anyone can be sure on either (1) or (2) after seeing one instance of such a graph.
As I see it, the only way out for "DK positivists" is to say that the DK hypothesis is unrelated to the truth values of (1) and (2). Or, that there is other evidence making DK convincing.
FWIW extreme groups (e.g. using upper and lower quartiles) is well understood in its inflation of effect size (there are even formulas to correct this, given an extreme groups design).
It's definitely related to ecological fallacy in the sense that both underestimate relative error and inflate effect sizes.
If others can't replicate it entirely on their own without "encouragement". Then it isn't useful at all and the original experiment can be safely ignored as irrelevant to humanity, along with any "prestige" associated with it.
If you measure competence as relative performance, a person cannot know how competent they are compared to others... because to do that correctly, they would not only have to know how much they know but also know how much other people know... preferably in relation to them.
This is not possible, so the self-assessment data will be random because it is a random guess... so it does not correlate to actual performance or anything else for that matter. Hence, DK effect has to be a result of faulty statistical analysis.
I believe we'd have completely different results if the question was framed differently: "how many do you believe you got right?". Then, more confident people, regardless of competence, would answer that they got more right and less confident people, again regardless of competence, would believe that they must have gotten more wrong than they did.
> If you tell me you didn’t have a single serious thought of self-assessing today, even semi-conscious, I simply won’t believe you.
I stopped reading at this point. Someone that is so certain that they say “I simply won’t believe you.” is too self-assured to be worth paying much attention to.
The author seems to go completely astray at some point.
> “Never assume dependence” gets so ingrained that people stubbornly hold on to the argument in the face of all the common sense I can conjure. If you still disagree that assuming dependence makes more sense in this case, I guess our worldviews are so different we can’t really have a meaningful discussion.
Hypothesis testing is concerned with minimization of Type I and Type II errors. In the Neyman-Pearson framework this calls for specific choice of the null hypothesis. Of course nothing prevents you to define the sets for H0 and H1 as arbitrarily as you want as long as you can mathematically justify your results.
It seems like the author fundamentally misunderstands the basics of statistics.
It bugs me that DK reached popular consciousness and get misinterpreted and misused more often than not. For one, the paper shows a positive correlation between confidence and skill. The paper is very clearly leading the reader, starting with the title. The biggest problem with the paper is not the methodology nor the statistics, it’s that the waxy prose comes to a conclusion that isn’t directly supported by their own data. People who are unskilled and unaware of it is not the only explanation for what they measured, nor is that even particularly likely, since they didn’t actually test anyone who’s verifiably or even suspected to be incompetent. They tested only Cornell undergrads volunteering for extra credit.
If DK is regression to the mean (a view I find convincing) that doesn't mean the effect isn't real; i.e. one would still observe that people of low ability overestimate their ability, simply because there is more "room" for overestimates than underestimates. And v.v.
Put differently, if everyone's estimate was exactly the mean, you'd still see a "DK effect".
I don't really understand the article. My understanding was that the mistake was that the error bounds differ depending on the test score from the original DK paper. A test score of 0 or 100 means a potential error of 0-100, whereas a test score of 50 means a potential error of 50. So if you take a group of people who score 0-25 points, if their self-assessment is completely random you'd still see a bias of overestimating score,because people who would give themselves a lower score if possible are unable to.
The charts make it clear that people's self-assessment was (roughly) independent of their skill level. It's not obvious that students' self-assessment would be mostly random / unrelated to skill level. For me that's a non-obvious result.
If people wander off through the verbiage of any article, where the chatter isn't supported by data, sure, they'll tend to get speculation.
Imagine in the Dunning-Kruger chart the second plot (perceived ability) was a horizontal line at 70, which is not true but not far off from the real results. Now imagine I told you "did you know that, regardless of their actual score, everyone thought they got a 70?" That's a surprising fact.
I think the most egregious thing about the original presentation is that it leads you to believe that people with a given skill level all self-assessed similarly. If you plotted the scores and self-assessments of each individual you would see that it's not "everyone [in the first quartile] thought [they were about average]", it's that their self-assessments varied wildly, from low and accurate to high and inaccurate.
It seems like the people who want to disprove Dunning-Kruger are falling victim to it.
I honestly think people take it way too seriously and apply it too generally. Quantifying "good" is hard if you don't know much about the field you're quantifying. Getting deep into a particular field is humbling -- Tetris seems relatively simple, but there are people who could fill a book with things _I_ don't know about it, despite playing at least a few hundred hours of it.
Is there an answer to that humility gained by being an expert in one field being translated to better self-assessment in other fields? I feel myself further appreciating the depth and complexity of fields I "wrote off" as trivial and uninteresting when I was younger as I get deeper into my own field (and see just how much deeper it is too).
> Is there an answer to that humility gained by being an expert in one field being translated to better self-assessment in other fields?
I think that often the opposite is true: people who become experts in one domain often assume that they are automatically experts in completely unrelated fields. I suspect that this is the cause of "Nobel disease": https://en.wikipedia.org/wiki/Nobel_disease
The open question this raises to me is why a DK=true set of data would show up with the same graph as a uniformly random set
What I'm really missing is a plot of the data without the aggregation. I find it very strange that X is broken down into quartiles but Y isn't, and when in quartiles, people estimated their skills relative to each other quite well: the line still goes up, and from bottom to top, would be a perfect X to X corelation
Uniformly random data means that someone’s perception of their ability is uncorrelated with their actual ability, which is exactly what DK=true is saying!
In partial "defense" of the "autocorrelation" article, the author was in fact arguing against their own perceived definition of DK, not what most people consider to be DK. They just didn't realise it.
Which is an all too common thing to begin with. (that particular article pulled the same stunt with the definition of the word 'autocorrelation', after all).
I read about DK and I was absolutely convinced that the effect was real. Then I read the article about DK being mere autocorrelation and I came away absolutely convinced that DK was bullshit. Then I read this article and I'm absolutely convinced that the 'DK is autocorrelation' hypothesis is utter BS. Sigh. There are lies, damned lies and statistics... :-)
Consider taking a more Bayesian view of the world, especially with scientific papers. I informally tell the students I work with to look for a constellation of papers that offer supporting evidence from multiple perspectives.
Me too. I believe the effect contains a logical recursion that is impossible to escape from. Maybe the randomness variable in it? It looks as if all validations and refutations of it are always going to appear logical. I don't know what to call it or compare it with but it feels this must be documented as being a prime example of its category.
Thank you infinitely for taking the time to respond.
I don't have this luxury in my life right now but I admit after reading the "original" post almost a fourth time, I was really hoping someone would take the time to explain why/how the author could be completely wrong (or not).
Sounds like the premise is flawed. He's assuming kids are good at getting another 10 minutes before bedtime. All of them? What about those who fail? Those that don't even try?
The issue is not the way our brains generalize, but that you are using just one brain, one life's experience.
It can give us an indication of how the growth rate depends on size
Except that what you've plotted there isn't the growth rate, but the absolute growth. Your argument for DK isn't convincing either, they claimed sth much stronger than that we can't assess our own skills.
Question for you folks that are smarter than me (see what I did there?) - DK has surfaced a lot here and in the online world more broadly with seemingly increased frequency. Why do you think that is?
Science is in deep crisis. It's only utility today is supporting industry and some public infrastructure. Social sciences are a scam, being economics the greatest racket amongst them all.
tldr; D+K's experiment was: Assign the numbers 1 thru 10 to ten people. Have each role a 10 sided die. The person assigned a 1 will roll higher than his assigned number 90% of the time.
Daniel:
>It’s not a “statistical artifact” - that will be your everyday experience living in such a world.
You can experience statistical effects. I think a lot of controversy comes from how Dunning and Kruger's paper leads people to interpret the data as hubris on the part of low-performers, and the statistical analysis demolishes that interpretation. Not knowing how well you performed is not the same thing psychologically as "overestimating" your performance.
Dunning Krueger is precisely about the surprising result that people are bad at estimating their performance!
If you accept the 'D-K is autocorrelation' argument, you don't get to throw out the existence of the D-K effect: you are saying Dunning + Krueger failed to show that humans have any ability to estimate how skilled they are at all.
That seems like an even more radical position than the D-K thesis.
The claim that skill does not exist or that people are totally unable to recognize how good they are at anything is quite radical.
You are sort of smuggling in the assumption for example that Olympian medalist lifters, when asked how much they can deadlift, will have the same distribution of answers as people who never deadlift (but are aware that totally sedentary men can probably deadlift like 200lbs and totally sedentary women can probably deadlift like 150lbs). If this were true, it would be worth publishing a paper about it.
It's sort of surprising to me to read your comment because TFA is an extended rebuttal of your comment.
> I think a lot of controversy comes from how Dunning and Kruger's paper leads people to interpret the data as hubris on the part of low-performers, and the statistical analysis demolishes that interpretation. Not knowing how well you performed is not the same thing psychologically as "overestimating" your performance.
D-K actually found that low performers were less accurate at assessing their skill than high performers, and the article you refer to obviously did not find this effect in random data, so I'm not sure how it was demolished.
> We don’t need statistics to learn about the world.
A sentence, written by the author on, commented by me on and read by the HN community on devices, which exist only thanks to 80-90 years of rigorous, statistics based QA in engineering, especially in mechanical/hardware engineering.
Anyhow, after spending years on a team filled with social science PHDs, I would not waste my time on reading papers about statistical analysis done by social scientist.
I feel like the author read the autocorrelation result, hated it and ignored the central point. There are ways to bucket data that removes the autocorrelation and in those experiments we also see the DK effect disappear. Trying to argue that we should study the effect with the autocorrelation present but ignore the autocorrelation for 'reasons' is not the way forward.
I feel like this article is severely over-complicating the analysis. Looking at the original blog post [1], their key claim appears to be that "random data produces the same curves as the DK effect, so the DK effect is a statistical artifact".
However, by "random data", the original blog means people and their self-assessments are completely independent! In fact, this is exactly what the DK effect is saying -- people are bad at self-evaluating [2]. (More precisely, poor performers overestimate their ability and high performers underestimate their ability.) In other words, the premise of the original blog post [1] is exactly the conclusion of DK!
Looking at the HN comments cited [3] by the current blog post, it appears that the main point of contention from other commenters was whether the DK effect means uncorrelated self-assessment or inversely correlated self-assessment. The DK data only supports the former, not the latter. I haven't looked at the original paper, but according to Wikipedia [2], the only claim being made appears to be the "uncorrelated" claim. (In fact, it is even weaker, since there is a slight positive correlation between performance and self-assessment.)
So, my conclusion would be that DK holds, but it does depend on exactly what is the exact claim in the original DK paper.
> I haven't looked at the original paper, but according to Wikipedia [2], the only claim being made appears to be the "uncorrelated" claim.
Is it that hard to actually check the original paper before bothering to make such a claim? The original paper explicitly claims to examine "why people tend to hold overly optimistic and miscalibrated views about themselves".
Yeah, the model is a simple linear model (which I've yet to see written down) with some correlation coefficient which is the unknown. Derive an estimator for that correlation coefficient, being explicit about the assumptions, then we can have a discussion. Until then it's all lots of noise. The raw data would help too.
The "The Dunning-Kruger Effect is Autocorrelation" article is an example of obvious bullshit.
Their claim that "If we have been working with random numbers, how could we possibly have replicated the Dunning-Kruger effect?" is the first blatantly false statement, and then the rest is built upon that so it can be safely disregarded.
It's easy to see this because while the effect is present if everyone evaluates themselves randomly, it's not present if everyone accurately evaluates themselves, and these are both clearly possible states of the world a priori, so it's a testable hypothesis about the real world, contrary to the bizarre claim in the paper.
Also, the knowledge that the authors published that article provides evidence for the Dunning-Kruger effect being stronger than one would otherwise believe.
Your comment amounts to saying that some of the randomly generated data really is consistently over estimating it's performance. How absurd.
Like similar analyses here you don't factor in that DK is about bias. Of course you can't see bias when test score=self assessment. That's because "IF everyone perfectly knows their score then there is no bias in their assessment" is a tautology.
That original article was bogus and needlessly combative. I feel like the majority view in the HN comments saw it as such.
Most comments were splitting hairs on what _exactly_ the Dunning-Kruger effect was, plus some general nerd-sniping on how the original article was off base.
IMO it was something that fell flat on its own rather than something that needed a lengthy refutation, but I can understand that sometimes these things get under your skin.
Just based on the graph just under the "The Dunning-Kruger Effect" section, one observation I'd like to present is that the subjects's numerical self-assessments fall into the same range as passing but non-stellar grades do in school. This may reflect a psychological bias in how the subjects use and understand percentages. Accordingly, that the two lines cross is a red herring.
The corollary of Dunning Kruger is that everyone is equally capable and equally capable of assessing their performance. This nicely suits the current social rhetoric but does not match observed reality.
Any discussion of statistics-based reasoning should include the concept of systematic bias, and that's not mentioned in this article at all. An example of systematic bias is that of an accurate but miscalibrated thermometer, where the spread of measurements at fixed temperature is small, but all measurements are off by some large factor.
Now with D-K the proposed problem is statistical autocorrelation, not systematic bias, due to lack of independence, as here:
> "Subtracting y – x seems fine, until we realize that we’re supposed to interpret this difference as a function of the horizontal axis. But the horizontal axis plots test score x. So we are (implicitly) asked to compare y – x to x"
Regardless, it's fairly obvious that D-K enthusiasts are of the opinion that a small group of expert technocrats should be trusted with all the important decisions, as the bulk of humanity doesn't know what's good for it. This is a fairly paternalistic and condescending notion (rather on full display during the Covid pandemic as well). Backing up this opinion with 'scientific studies' is the name of the game, right?
It does vaguely remind me of the whole Bell Curve controversy of years past... in that case, systematic bias was more of an issue:
> "The last time I checked, both the Protestants and the Catholics in Northern Ireland were white. And yet the Catholics, with their legacy of discrimination, grade out about 15 points lower on I.Q. tests. There are many similar examples."
I am reminded of something my very accomplished PI (in the field of earth system science) confided privately to me once... "Purely statistical arguments," she said, "are mostly bullshit..."
> Regardless, it's fairly obvious that D-K enthusiasts are of the opinion that a small group of expert technocrats should be trusted with all the important decisions
It seems like you're roughly the only person who thinks this.
quanto|3 years ago
> Why so angry? I know I’ve taken this far too personally. I have no illusions that everything I read online should be correct, or about people’s susceptibility to a strong rhetoric cleverly bashing conventional science, even in great communities such as HN. But frankly, for the last few years, the world seems to be accelerating the rate at which it’s going crazy, and it feels to me a lot of that is related to people’s distrust in science (and statistics in particular). Something about the way the author conveniently swapped “purely random” with “null hypothesis” (when it’s inappropriate!) and happily went on to call the authors “unskilled and unaware of it”, and about the ease with which people jumped on to the “lies, damned lies, statistics” wagon but were very stubborn about getting off, got to me. Deeply. I couldn’t let this go.
I am afraid I actually agree with the author's point. The anti-intellectual, anti-scientific streak in many poor analyses claiming to debunk some scientific research is deeply concerning in our society. If someone is trying to debunk some scientific research, at least he should learn some basic analytic tools. This observation is independent of whether the original DK paper could have been better.
That said, I give the benefit of doubt to the author of "The DK Effect is Autocorrelation." It is a human error to be overly zealous in some opinions without thinking it through.
darawk|3 years ago
There is no pat "trust science more" or "trust amateurs less" answer here. The actual answer is that if you want to understand research, you need to actually understand mathematical statistics and the philosophy of statistics fairly deeply. There just isn't any way around it.
1. https://journals.plos.org/plosmedicine/article?id=10.1371/jo...
boppo1|3 years ago
I recall one study that said all white people are committing environmental racism against all non-white people. I dove in and read the whole thing wondering what method could have yielded scientific confidence in such a broad result. Turns out the model used was a semi-black box that required a request for access and a supercomputer to run. But it was in a Peer Reviewed Scientific Journal and had lots of Graduate Level Statistics so I guess it seemed trustworthy.
abirch|3 years ago
If only there were a term for "a cognitive bias whereby people with limited knowledge or competence in a given intellectual or social domain greatly overestimate their own knowledge or competence in that domain relative to objective criteria or to the performance of their peers or of people in general"
coffeeblack|3 years ago
c1ccccc1|3 years ago
Anyway, "The DK Effect is Autocorrelation" definitely seems to be both statistically literate, and a good faith criticism of the Dunning-Kruger paper. In light of that, calling it "anti-scientific" seems unfair, since criticism and debate are an important part of science.
heavyset_go|3 years ago
[1] https://rationalwiki.org/wiki/Essay:Second-option_bias
Rastonbury|3 years ago
Behavioural science is a pretty new field, its pretty easy to get abberant results or manipulate the results to show 'something' statistically. Many findings in earlier papers could not be replicated, or had applied statistics incorrectly, or showed different results when research participants were not white college kids.
This is a whole other problem within academia, the pressure to publish something even when there is nothing and perceived legitimacy based on the number of citations a paper has. My professor always said don't look at the number of citations, understand the method and the rebuttal, there were numerous low citation but solid papers showing flaws in famous ones but everyone who isn't deep into the subject holds the original assertion to be legitimate because its "famous"
legalcorrection|3 years ago
gambler|3 years ago
People endlessly reference the Dunning-Kruger effect as a meme, without ever having read the paper, let alone having checked its methods. You don't seem to have a problem with that.
On the other hand, after seeing an article that uses essentially statistical arguments to debate a scientific study you conclude that there is some "anti-intellectual, anti-scientific streak" in our society and that it should be of grave concern.
This doesn't make any sense except as an extreme case of virtue-signaling.
danbruc|3 years ago
Everyone is free to question the results, but after actually reading the entire paper I can confidently say that poking a bit at the correlation in the charts falls way short of undermining the actual findings from the paper. The actual results are much more detailed and nuanced than two straight lines at an angle.
[1] https://www.researchgate.net/publication/12688660_Unskilled_...
mike_hearn|3 years ago
1. It uses a tiny sample size.
2. It assumes American psych undergrads are representative of the entire human race.
3. It uses stupid and incredibly subjective tests, then combines that with cherry picking:
"Thus, in Study 1 we presented participants with a series of jokes and asked them to rate the humor of each one. We then compared their ratings with those provided by a panel of experts, namely, professional comedians who make their living by recognizing what is funny and reporting it to their audiences. By comparing each participant's ratings with those of our expert panel, we could roughly assess participants' ability to spot humor ... we wanted to discover whether those who did poorly on our measure would recognize the low quality of their performance. Would they recognize it or would they be unaware?"
In other words, if you like the same humor as professors and their hand-picked "joke experts" then you will be assessed as "competent". If you don't, then you will be assessed as "incompetent".
Of course, we can already guess what happened next - their hand picked experts didn't agree on which of their hand picked jokes were funny. No problem. Rather than realize this is evidence their study design is maybe not reliable they just tossed the outliers:
"Although the ratings provided by the eight comedians were moderately reliable (a = .72), an analysis of interrater correlations found that one (and only one) comedian's ratings failed to correlate positively with the others (mean r = -.09). We thus excluded this comedian's ratings in our calculation of the humor value of each joke"
The fact that this actually made it into their study at all, that peer reviewers didn't immediately reject it, and that the Dunning-Krueger effect became famous, is a great example of why people don't or shouldn't take the social sciences seriously.
MichaelBurge|3 years ago
So it seems people are bad at doing global rankings. If I tried to rank myself amongst all programmers worldwide, that seems really hard and I could see myself picking some "safe" above-average value just because I don't know that many other people.
There's also: If you take 1 class in piano 30 years ago and can only play 1 simple song, that might put you in the 90th percentile worldwide just because most people can't play at all. But you might be at the 10th percentile amongst people who've taken at least 1 class. So doing a global ranking can be very difficult if you aren't exactly sure what the denominator set looks like.
So I think it's an artifact of using "ranking" as an axis. If the metric was, "predict the percentage of questions you got correct" vs. "predict your ranking", maybe people would be more accurate because it wouldn't involve estimating the denominator set.
cortesoft|3 years ago
bmacho|3 years ago
Yes, and this literally implies that people in the lowest quartiles can't and won't rate themselves to be in the lowest quartiles when they are forced to give an answer. (Especially on tests that doesn't measure anything (getting jokes? really?), on tests that they have no knowledge about (how would they know that how their classmates perform on an IQ test???), or on tests that just have a high variance.)
And therefore they will "overestimate their performance".
It's like grouping a bunch of random people, and forcing them to answer whether their house is short, average or high. The "people living in short houses" will "overestimate the height of their houses", while the "people living in towers" will humbly say they live in an average high house.
Is this an existing and relevant psychological phenomenon, different from the general inability to guess unknown things? I don't think so.
If you think so, then give me proof.
fshbbdssbbgdd|3 years ago
1) self-assessment is perfectly correlated with skill, or
2) completely uncorrelated.
I think neither of these makes sense as a null hypothesis.
The model you describe matches my intuition about what we should expect: people know something about their own skill level, but not everything.
0x20cowboy|3 years ago
galaxyLogic|3 years ago
When you learn something you also learn what are some of the mistakes you can make. You evaluate your performance then against the mistakes you didn't make. Consider a piano player, or figure-skater. You have to know about what figures are difficult to perform to evaluate a performance, and you don't know what the difficult ones are until you have studied and tried to perform them.
dahart|3 years ago
It’s been argued before that this is the only reason that DK gained any notoriety; because it feels right, not because it is right. It’s the “just-world” theory: we want to believe that confident people are overcompensating.
Is it actually intuitive though? Consider your own example. Most people who don’t know piano or figure skating are well aware that they don’t know, and do not rate themselves highly at all. Would it be surprising to learn that people who don’t know any law or engineering don’t often hold any doubts about their lack of skill, and by and large are not deluded nor erroneously believe they’re great at these things they don’t know?
The DK paper didn’t measure knowledge-based skills like piano, figure skating, or law. It measured things like the ability to get a joke, and conversational grammar. How would you rate your own ability to get a joke? (Does this question really even make a lot of sense?)
It’s important that the methods in the DK paper focused on tasks that are hard to self-evaluate, because when people have tried to replicate DK with more well defined knowledge-based activities, they have often demonstrated the complete opposite effect, that there is widespread impostor syndrome, and skilled people underestimate themselves.
sam0x17|3 years ago
juancn|3 years ago
Conversely it cannot also be under zero, so error is most likely going to be above the actual skill line (over estimation, since it's clamped below it).
djmips|3 years ago
skybrian|3 years ago
If you had never listened to a professional play piano before then you'd have no idea what level of performance is possible. Similarly, if you had never seen skilled skaters perform on TV.
But we have done these things, so it's obvious that they're doing something that's very difficult.
Maybe you don't fully appreciate the skill, though. You wouldn't do well as a judge who compares the performances of professionals. But comparing novices to professionals seems easy?
waynesonfire|3 years ago
I linked to the start of the video where he begins to build the idea. TLDR is he mentions Monet painting Impression Sunrise and how it was something that people have never seen before and it took a bit of time for it to blow people away--they needed to develop "new eyes" to see the genius. Adam then dives into this idea of "new eyes". I'm sure many of us have experienced this in our life and it was so nice to hear Adam unpack it.
https://www.youtube.com/watch?v=qE7dYhpI_bI&t=122s
unknown|3 years ago
[deleted]
unknown|3 years ago
[deleted]
javajosh|3 years ago
If human cultures can be characterized as default arrogant or default humble then it stands to reason that arrogant cultures will have a DK effect, and in humble cultures you won't.
darawk|3 years ago
TrackerFF|3 years ago
spiderfarmer|3 years ago
etchalon|3 years ago
A large portion of the HN audience really, really wants to think they're smarter than mostly everyone else, including most experts. Very few are. I'm certainly not.
Articles which "debunk" some commonly held belief, especially those wrapped in what appears to be an understandable, logical, followable argument, are going to be cat nip here.
Articles like this are even stronger cat nip. If a member of the HN audience wants to believe they're mostly smarter than mostly everyone else, that includes other members of the HN audience.
So, whenever I read an article and come away thinking that, having read the article, I'm suddenly smarter than a huge number of experts, especially if, like the original article, it's because I understand "this one simple trick!", I immediately discard that knowledge and forget I read it.
If the article is right, it will be debated and I'll see more articles about it, and it'll generate sufficient echoes in the right caves of the right experts. Once it does, I can change my view then.
I am not a statistician, or a research scientist. I have no idea which author is right. But, my spider sense says that if dozens of scientific papers, written by dozens of people who are, failed to notice their "effect" was just some mathematical oddity, that'd be pretty incredible.
And incredible things require incredible evidence. And a blog post rarely, if ever, meets that standard.
goosedragons|3 years ago
It's not that at all. The assumption should be that everyone is equally good (or bad) at assessing their performance. Not that they have no ability but that the means between groups is the same vs. not the same. That the ability to assess themselves is independent of performance.
karpierz|3 years ago
Say that everyone is equally okay at assessing themselves, and get within 0.1 of their actual performance (rated from 0 to 1). Then X and Y are going to be very correlated, as X - 0.1 < Y < X + 0.1. But X-Y will look like a random plot, since Y is randomly sampled around X.
The only case where X and Y wouldn't correlate at all is if people have no ability to assess their performance (IE, Y isn't sampled around X, but is instead sampled from a fixed range).
blamestross|3 years ago
The less you know, the more random your guess at your own knowledge is. The actual value is low and less than zero isn't an option, so this drags the average up consistently.
The more you know, the more accurate your guess of your knowledge is. Especially as you hit the limits of the test, this noise can only drag the average down, but less dramatically than the other case.
With the reasonable conclusion: We all suck at guessing how much we know, but the more you know the less you suck until you hit the limits of the framework you are using for quantization of knowledge.
twobitshifter|3 years ago
majormajor|3 years ago
unknown|3 years ago
[deleted]
_0ffh|3 years ago
zharknado|3 years ago
I’m not a statistician but I do have some basic training in psychometrics. It might be interesting/helpful to point out that your priors about self-assessment seem more reasonable generally but also put a lot of faith in the test’s validity as a measure of skill.
I’m relying on intuition here, but it seems a little problematic that the actual score and the predicted score are both bound to the same measurement scheme. Given that constraint on some level we’re not really talking about an external construct of skill, just test performance and whether people estimate it well. Which is different from estimating their skill well.
Maybe someone with more actual skill can elaborate or correct haha.
parentheses|3 years ago
jasonhansel|3 years ago
It serves mostly as a way of reassuring themselves of their own superiority. The message (for them) basically amounts to "other people's claim to knowledge is just further proof that they don't know anything."
sjmm1989|3 years ago
semanticjudo|3 years ago
omnicognate|3 years ago
Unfortunately the original article isn't very clearly explained, and it's only on reading the discussion in the comments under it that it becomes clear what it's actually saying.
The point is about signal & noise. Say your random variable X contains a signal component and a noise component, the former deterministic and the latter random. Say you correlate Y-X against X, and further say you use the same sample of X when computing Y-X as when measuring X. In this case your correlation will include the correlation of a single sample of the noise part of X with its own negation, yielding a spurious negative component that is unrelated to the signal but arises purely from the noise. The problem can be avoided by using a separate sample of X when computing Y-X.
The example in the original "DK is autocorrelation" article is an extreme illustration of this. Here, there is no signal at all and X is pure noise. Since the same sample of X is used a strong negative correlation is observed. The key point though is that if you use a separate sample of X that correlation disappears completely. I don't think people are realising that in the example given the random result X will yield another totally random value if sampled again. It's not a random result per person, it's a random result per testing of a person.
This is only one objection to the DK analysis, but it's a significant one AFAICS. It can be expected that any measurement of "skill" will involve a noise component. If you want to correlate two signals both mixed with the same noise sources you need to construct the experiment such that the noise is sampled separately in the two cases you're correlating.
Of course the extent to which this matters depends on the extent to which the measurement is noisy. Less noise should mean less contribution of this spurious autocorrelation to the overall correlation.
To give another ridiculous, extreme illustration: you could throw a die a thousand times and take each result and write it down twice. You could observe that (of course) the first copy of the value predicts the second copy perfectly. If instead you throw the die twice at each step of the experiment and write those separately sampled values down you will see no such relationship.
andersource|3 years ago
What you're saying is that we need to verify the statistical reliability of the skill tests DK gave, and to some extent that we need to scrutinize the assumption that there indeed is such a thing as "skill" to be measured in the first place. I hope we can both agree that skill exists. That leaves the test reliability (technical term from statistics, not in the broad sense).
What's simulated by purely random numbers is tests with no reliability whatsoever. Of course if the tests DK gave to subjects don't actually measure anything at all, the DK study is meaningless. If that's what the original article's author is trying to say, they sure do it in a very roundabout way, not mentioning the test reliability at all. I'd be completely fine reading an article examining the reliability of the tests. Otherwise, I again fail to see how the random number analysis has anything to do with the conclusions of DK.
In fact, DK do concern themselves with the test reliability, at least to some extent. That doesn't appear in the graph under scrutiny but appears in the study.
If you assume the tests are reliable, and you also assume that DK are wrong in that people's self-assessment is highly correlated with their performance, and generate random data accordingly, you'll still get no effect even if you sample twice as you propose.
> The key point though is that if you use a separate sample of X that correlation disappears completely
Separate sample of X under the assumption of no dependence at all of the first sample, i.e., assuming there is no such a thing as skill, or assuming completely unreliable tests. So, not interesting assumptions, unless you want to call into question the test reliability, which neither you nor the author are directly doing.
seniortaco|3 years ago
My understanding is that the hypothesis is "Those who are incompetent overestimate themselves, and experts underestimate themselves".
DK says: True
DK is Autocorrelation says: ???
"I cant let go..." says: True?
HN says: also True?
Is there really any debate here? The "DK is Autocorrelation" article seems to be the only odd one out, and it's not clear if it even makes a proposal either way about the DK hypothesis. It talks about the Nuhfer study, but that seems Apples vs Oranges since it buckets by education level. Then it also points out that random noise would also yield the DK effect. But that also does not address the DK hypothesis, and it would indeed be very surprising if people's self evaluation was random!
So should my takeaway here just be that the DK hypothesis is True and that this is all arguing over details?
vetleen|3 years ago
DK is Autocorrelation says: The DK article is based on a false premise, we got to disregard it
"I cant let go..." says: Actually, given that we assume people are somewhat capable of self-assessment, which is reasonable, "DK is Autocorrelation" is the one based on a false premise, and we should disregard that one instead, and not DK.
formerly_proven|3 years ago
The DK hypothesis is "double burden of the incompetent": "Because incompetent people are incompetent, they fail to comprehend their incompetence and therefore overestimate their abilities more than expertes underestimate theirs"
Arguably the hypothesis that matches the data from the DK paper best is: "Everyone thinks they're average regardless of skill level"
hef19898|3 years ago
I hate it when people are "solely2 using statistics, and other first-principle thinking approaches, to understand well researched and documented topics. And I hate it if people use solely statistics to criticize research without considering the other aspects of it. Does it mean the DK effect can be discarded or not? I don't know, I think some disagreement over the statistical methods is not enough to come to any conclusion.
Attacking the Dunning-Kruger study only on statistical grounds looks like aprime example of the DK effect in itself...
john_pryan|3 years ago
Observation: if you rank via true "skill" and assume for a particular instance the predicted performance and observed performance are independent but both have the true skill as their mean you dont observe the effect. CC of 0.00332755.
If you rank via observed performance and plot observed vs predicted the effect is there. CC of -0.38085757.
This is assuming very simple gaussian noise which is not going to be accurate especially as most of these tasks have normalised scores.
Edit: fixed wrong way around
https://colab.research.google.com/drive/1Vy7JjkywxwEP8nfR6oS...
andersource|3 years ago
What your simulation includes and the original article didn't (and I didn't touch at all in my article) is the statistical reliability of the tests they administered. Where you got a CC of -0.38 you used equal reliability (/ unreliability) of the skill tests and self-assessments. You can see that as you increase the test reliability, the CC shrinks and the effect disappears.
I have no idea what's the actual reliability of the DK tests, they do seem to consider that but maybe not thoroughly enough. In my view it's very fair to criticize DK from that angle. But that would require looking at the actual tests and their data.
My point being, that any purely random analysis is based on assumptions that can easily be tweaked to show the same effect, the opposite effect, or no effect at all.
kybernetikos|3 years ago
sandgiant|3 years ago
IshKebab|3 years ago
So my suspicion is that the DK effect is not really a symptom of people's inability to accurately self-assess, but they're unwillingness to accurately report that self-assessment.
And I don't think it is unique to self assessment either. It's common knowledge that ratings on a scale out of 10 for pretty much everything are nearly always between 6 and 9.
I don't know how they did the experiment but I bet they'd get different results if the self-assessments were anonymous and accuracy came with a big financial reward.
Anyway that's all irrelevant to the point of the article which I think is correct.
nosefrog|3 years ago
And that's the point the original article was trying to make ("The reason turns out to be embarrassingly simple: the Dunning-Kruger effect has nothing to do with human psychology. It is a statistical artifact — a stunning example of autocorrelation."), though that point does lost a bit as it goes on.
I think this article gives a better summary of how the Dunning-Kruger effect probably isn't a psychological effect: https://www.mcgill.ca/oss/article/critical-thinking/dunning-...
civilized|3 years ago
This isn't true either. Statistical dependence does not determine or uniquely identify causal interpretation or system structure. See Judea Pearl's works (e.g. The Book of Why) for more on this.
People lacking the ability to self-assess is interesting psychologically. People can learn from experience in many other contexts. People can judge their relative position versus other people in many contexts. Why would they be so bad at this particular task? There could be a psychological underpinning.
Even if it turns out we have useless noise-emitting fluff in the place that would produce self-awareness of skill, that would be a psychological cause of a psychological effect. Not the ones that Dunning and Kruger believed they were seeing, but still.
Now, if you asked frogs for a self-assessment of skill, I would expect that data would not show any psychological effects.
unknown|3 years ago
[deleted]
anonymoushn|3 years ago
randcraw|3 years ago
1) an incompetent person is poorer than average at self assessment of their skill
2) as a person's competence increases at a skill, their ability to self-assess improves, until they become 'expert' which is defined by underappreciating their own skill (or overappreciating the skill of others)
3) DK is surprising (interesting) only when some incompetent persons who suffer from DK cannot improve their performance, presumably because their poor self-assessment prevents their learning from experience or from others.
4) Worse yet, some persons suffering from DK cannot improve their performance in numerous skill areas, presumably because their poor self-assessment is caused by a broad cognitive deficit (e.g. political bias), preventing them from improving on multiple fronts (which are probably related in some thematic way).
If DK is selective to include only one or two skill areas, as in case 3, that is not especially surprising, since most of us have skill deficits that we never surmount (e.g. bad at math, bad at drawing, etc). DK becomes surprising only in case 4, when we claim there is a select group of persons who have broad learning deficits, presumably rooted in poor assessment of self AND others — to wit, they cannot recognize the difference between good performance and bad, in themselves or others. Presumably they prefer delusion (possibly rooted in politics or gangsterism) to their acknowledgement of enumerable and measurable characteristics that separate superior from inferior performance, and that reflect hard work leading to the mastery of subtle technique.
If case 4 is what makes DK surprising, then DK certainly is not described well by the label 'autocorrelation' — which seems only to describe the growth process of a caterpillar as it matures into a butterfly.
bryanrasmussen|3 years ago
The surprising things about DK, to me at any rate, is how unvarying it is in application. Under DK people who are poor at something never think wow I really suck at this, or if they do they are such a minuscule part of the population that we can discount them.
I've known lots of people who were not good at particular things and did not rate themselves as competent at it, although truth is they might have claimed competence if asked by someone they didn't want to be honest with.
watwut|3 years ago
haberman|3 years ago
The author says as much in this article:
> Why so angry? [...] [Frankly], for the last few years, the world seems to be accelerating the rate at which it’s going crazy, and it feels to me a lot of that is related to people’s distrust in science (and statistics in particular). Something about the way the author conveniently swapped “purely random” with “null hypothesis” (when it’s inappropriate!) and happily went on to call the authors “unskilled and unaware of it”, and about the ease with which people jumped on to the “lies, damned lies, statistics” wagon but were very stubborn about getting off, got to me. Deeply. I couldn’t let this go.
It's true, the previous article (https://economicsfromthetopdown.com/2022/04/08/the-dunning-k...) was pretty harsh on the authors of the original paper:
> In their seminal paper, Dunning and Kruger are the ones broadcasting their (statistical) incompetence by conflating autocorrelation for a psychological effect. In this light, the paper’s title may still be appropriate. It’s just that it was the authors (not the test subjects) who were ‘unskilled and unaware of it’.
But on some level, the original paper sounds just as condescending and dismissive. It presents a scholarly and statistical framework for looking down on "the incompetent" (a phrase used four times in the original paper). In practice, most of the times I see the DK effect cited, it functions as a highbrow and socially acceptable way of calling someone else stupid, in not so many words.
Cards on the table, I've never liked DK discourse for this reason. It's always easy to imagine others as the "Unskilled and Unaware", and for this reason bringing DK into any discussion rarely generates much insight.
jasonhansel|3 years ago
I think it's even worse that that: it's also a socially acceptable way of enforcing credentialism and looking down on others for not having a sufficiently elite education.
insaider|3 years ago
lamontcg|3 years ago
kurthr|3 years ago
torginus|3 years ago
For example, if someone gave me (or you) a leetcode-style test, and told me I'd be competing against a sample picked from the general population, and ask me how well I did, I'd probably rate myself near the top with high confidence.
Conversely, if my competitors were skilled competitive coders, I'd put myself near the bottom, again with high confidence.
Now, if I had to compete with a different group, say my college classmates, or fellow engineers from a different department, I'd be in trouble, if I scored high, what does that mean? Maybe others scored even higher. Or if I couldn't solve half of the problems, maybe others could solve even less - point is I don't know.
In that case the reasonable approach for me would be to assume I'm in the 50th percentile, then adjust it a bit based on my feelings - which is basically what happened in this scenario, and would produce the exact same graph if everyone behaved like that.
No need to tell tall tales of humble prodigies and boastful incompetents.
jldugger|3 years ago
I find the use of quartiles suspicious, personally. It's very nearly the ecological fallacy[1].
> I’m not going to start reviewing and comparing signal-to-noise ratios in Dunning-Kruger replications
DK has been under fire for a while now, nearly as long as the paper has existed[2]. At present, I am in the "effect may be real but is not well supported by the original paper" camp. If DK wanted to they could release the original data, or otherwise encourage a replication.
[1]: https://en.wikipedia.org/wiki/Ecological_correlation [2]: https://replicationindex.com/2020/09/13/the-dunning-kruger-e...
geysersam|3 years ago
1. Average self assessment coincides with true skill, but variance increases with low skill.
2. Average self assessment is biased, and the bias is positive when you are unskilled and negative when you're highly skilled.
These two situations would create indistinguishable DK-graphs. I don't understand how anyone can be sure on either (1) or (2) after seeing one instance of such a graph.
As I see it, the only way out for "DK positivists" is to say that the DK hypothesis is unrelated to the truth values of (1) and (2). Or, that there is other evidence making DK convincing.
Neither seems very plausible!
closed|3 years ago
It's definitely related to ecological fallacy in the sense that both underestimate relative error and inflate effect sizes.
_carbyau_|3 years ago
If others can't replicate it entirely on their own without "encouragement". Then it isn't useful at all and the original experiment can be safely ignored as irrelevant to humanity, along with any "prestige" associated with it.
ComradePhil|3 years ago
This is not possible, so the self-assessment data will be random because it is a random guess... so it does not correlate to actual performance or anything else for that matter. Hence, DK effect has to be a result of faulty statistical analysis.
I believe we'd have completely different results if the question was framed differently: "how many do you believe you got right?". Then, more confident people, regardless of competence, would answer that they got more right and less confident people, again regardless of competence, would believe that they must have gotten more wrong than they did.
irrational|3 years ago
I stopped reading at this point. Someone that is so certain that they say “I simply won’t believe you.” is too self-assured to be worth paying much attention to.
Scea91|3 years ago
LudwigNagasena|3 years ago
> “Never assume dependence” gets so ingrained that people stubbornly hold on to the argument in the face of all the common sense I can conjure. If you still disagree that assuming dependence makes more sense in this case, I guess our worldviews are so different we can’t really have a meaningful discussion.
Hypothesis testing is concerned with minimization of Type I and Type II errors. In the Neyman-Pearson framework this calls for specific choice of the null hypothesis. Of course nothing prevents you to define the sets for H0 and H1 as arbitrarily as you want as long as you can mathematically justify your results.
It seems like the author fundamentally misunderstands the basics of statistics.
dahart|3 years ago
It bugs me that DK reached popular consciousness and get misinterpreted and misused more often than not. For one, the paper shows a positive correlation between confidence and skill. The paper is very clearly leading the reader, starting with the title. The biggest problem with the paper is not the methodology nor the statistics, it’s that the waxy prose comes to a conclusion that isn’t directly supported by their own data. People who are unskilled and unaware of it is not the only explanation for what they measured, nor is that even particularly likely, since they didn’t actually test anyone who’s verifiably or even suspected to be incompetent. They tested only Cornell undergrads volunteering for extra credit.
denton-scratch|3 years ago
Put differently, if everyone's estimate was exactly the mean, you'd still see a "DK effect".
jcranberry|3 years ago
oh_my_goodness|3 years ago
If people wander off through the verbiage of any article, where the chatter isn't supported by data, sure, they'll tend to get speculation.
unknown|3 years ago
[deleted]
rafaeltorres|3 years ago
Imagine in the Dunning-Kruger chart the second plot (perceived ability) was a horizontal line at 70, which is not true but not far off from the real results. Now imagine I told you "did you know that, regardless of their actual score, everyone thought they got a 70?" That's a surprising fact.
topaz0|3 years ago
unknown|3 years ago
[deleted]
clwk|3 years ago
aaaronic|3 years ago
I honestly think people take it way too seriously and apply it too generally. Quantifying "good" is hard if you don't know much about the field you're quantifying. Getting deep into a particular field is humbling -- Tetris seems relatively simple, but there are people who could fill a book with things _I_ don't know about it, despite playing at least a few hundred hours of it.
Is there an answer to that humility gained by being an expert in one field being translated to better self-assessment in other fields? I feel myself further appreciating the depth and complexity of fields I "wrote off" as trivial and uninteresting when I was younger as I get deeper into my own field (and see just how much deeper it is too).
jasonhansel|3 years ago
I think that often the opposite is true: people who become experts in one domain often assume that they are automatically experts in completely unrelated fields. I suspect that this is the cause of "Nobel disease": https://en.wikipedia.org/wiki/Nobel_disease
robocat|3 years ago
ncmncm|3 years ago
8note|3 years ago
What I'm really missing is a plot of the data without the aggregation. I find it very strange that X is broken down into quartiles but Y isn't, and when in quartiles, people estimated their skills relative to each other quite well: the line still goes up, and from bottom to top, would be a perfect X to X corelation
obastani|3 years ago
tpoacher|3 years ago
In partial "defense" of the "autocorrelation" article, the author was in fact arguing against their own perceived definition of DK, not what most people consider to be DK. They just didn't realise it.
Which is an all too common thing to begin with. (that particular article pulled the same stunt with the definition of the word 'autocorrelation', after all).
sumanthvepa|3 years ago
jasonhong|3 years ago
stagas|3 years ago
weird-eye-issue|3 years ago
nokya|3 years ago
I don't have this luxury in my life right now but I admit after reading the "original" post almost a fourth time, I was really hoping someone would take the time to explain why/how the author could be completely wrong (or not).
Thanks.
emsign|3 years ago
The issue is not the way our brains generalize, but that you are using just one brain, one life's experience.
t_mann|3 years ago
Except that what you've plotted there isn't the growth rate, but the absolute growth. Your argument for DK isn't convincing either, they claimed sth much stronger than that we can't assess our own skills.
wodenokoto|3 years ago
https://news.ycombinator.com/item?id=31036800
brodouevencode|3 years ago
gverrilla|3 years ago
andi999|3 years ago
marcholagao|3 years ago
Daniel: >It’s not a “statistical artifact” - that will be your everyday experience living in such a world.
You can experience statistical effects. I think a lot of controversy comes from how Dunning and Kruger's paper leads people to interpret the data as hubris on the part of low-performers, and the statistical analysis demolishes that interpretation. Not knowing how well you performed is not the same thing psychologically as "overestimating" your performance.
jameshart|3 years ago
Dunning Krueger is precisely about the surprising result that people are bad at estimating their performance!
If you accept the 'D-K is autocorrelation' argument, you don't get to throw out the existence of the D-K effect: you are saying Dunning + Krueger failed to show that humans have any ability to estimate how skilled they are at all.
That seems like an even more radical position than the D-K thesis.
anonymoushn|3 years ago
You are sort of smuggling in the assumption for example that Olympian medalist lifters, when asked how much they can deadlift, will have the same distribution of answers as people who never deadlift (but are aware that totally sedentary men can probably deadlift like 200lbs and totally sedentary women can probably deadlift like 150lbs). If this were true, it would be worth publishing a paper about it.
It's sort of surprising to me to read your comment because TFA is an extended rebuttal of your comment.
> I think a lot of controversy comes from how Dunning and Kruger's paper leads people to interpret the data as hubris on the part of low-performers, and the statistical analysis demolishes that interpretation. Not knowing how well you performed is not the same thing psychologically as "overestimating" your performance.
D-K actually found that low performers were less accurate at assessing their skill than high performers, and the article you refer to obviously did not find this effect in random data, so I'm not sure how it was demolished.
NumberCruncher|3 years ago
A sentence, written by the author on, commented by me on and read by the HN community on devices, which exist only thanks to 80-90 years of rigorous, statistics based QA in engineering, especially in mechanical/hardware engineering.
Anyhow, after spending years on a team filled with social science PHDs, I would not waste my time on reading papers about statistical analysis done by social scientist.
Traster|3 years ago
> We don't [only] need [the formal discipline of mathematics known as] statistics to learn about the world.
Sure, there are things you can only functionally ascertain through statistical analysis. But not everything in the world needs rigorous statistics.
TimPC|3 years ago
longtimegoogler|3 years ago
unknown|3 years ago
[deleted]
unknown|3 years ago
[deleted]
obastani|3 years ago
However, by "random data", the original blog means people and their self-assessments are completely independent! In fact, this is exactly what the DK effect is saying -- people are bad at self-evaluating [2]. (More precisely, poor performers overestimate their ability and high performers underestimate their ability.) In other words, the premise of the original blog post [1] is exactly the conclusion of DK!
Looking at the HN comments cited [3] by the current blog post, it appears that the main point of contention from other commenters was whether the DK effect means uncorrelated self-assessment or inversely correlated self-assessment. The DK data only supports the former, not the latter. I haven't looked at the original paper, but according to Wikipedia [2], the only claim being made appears to be the "uncorrelated" claim. (In fact, it is even weaker, since there is a slight positive correlation between performance and self-assessment.)
So, my conclusion would be that DK holds, but it does depend on exactly what is the exact claim in the original DK paper.
[1] https://economicsfromthetopdown.com/2022/04/08/the-dunning-k...
[2] https://en.wikipedia.org/wiki/Dunning%E2%80%93Kruger_effect
[3] https://news.ycombinator.com/item?id=31036800
unknown|3 years ago
[deleted]
saurik|3 years ago
Is it that hard to actually check the original paper before bothering to make such a claim? The original paper explicitly claims to examine "why people tend to hold overly optimistic and miscalibrated views about themselves".
hgomersall|3 years ago
unknown|3 years ago
[deleted]
devit|3 years ago
Their claim that "If we have been working with random numbers, how could we possibly have replicated the Dunning-Kruger effect?" is the first blatantly false statement, and then the rest is built upon that so it can be safely disregarded.
It's easy to see this because while the effect is present if everyone evaluates themselves randomly, it's not present if everyone accurately evaluates themselves, and these are both clearly possible states of the world a priori, so it's a testable hypothesis about the real world, contrary to the bizarre claim in the paper.
Also, the knowledge that the authors published that article provides evidence for the Dunning-Kruger effect being stronger than one would otherwise believe.
ta123457864|3 years ago
Like similar analyses here you don't factor in that DK is about bias. Of course you can't see bias when test score=self assessment. That's because "IF everyone perfectly knows their score then there is no bias in their assessment" is a tautology.
unknown|3 years ago
[deleted]
soVeryTired|3 years ago
Most comments were splitting hairs on what _exactly_ the Dunning-Kruger effect was, plus some general nerd-sniping on how the original article was off base.
IMO it was something that fell flat on its own rather than something that needed a lengthy refutation, but I can understand that sometimes these things get under your skin.
prvc|3 years ago
unknown|3 years ago
[deleted]
ipnon|3 years ago
unknown|3 years ago
[deleted]
unknown|3 years ago
[deleted]
murrayb|3 years ago
Edit-see below I meant opposite not corollary.
smw|3 years ago
bannedbybros|3 years ago
[deleted]
keshet|3 years ago
unknown|3 years ago
[deleted]
photochemsyn|3 years ago
Now with D-K the proposed problem is statistical autocorrelation, not systematic bias, due to lack of independence, as here:
> "Subtracting y – x seems fine, until we realize that we’re supposed to interpret this difference as a function of the horizontal axis. But the horizontal axis plots test score x. So we are (implicitly) asked to compare y – x to x"
Regardless, it's fairly obvious that D-K enthusiasts are of the opinion that a small group of expert technocrats should be trusted with all the important decisions, as the bulk of humanity doesn't know what's good for it. This is a fairly paternalistic and condescending notion (rather on full display during the Covid pandemic as well). Backing up this opinion with 'scientific studies' is the name of the game, right?
It does vaguely remind me of the whole Bell Curve controversy of years past... in that case, systematic bias was more of an issue:
> "The last time I checked, both the Protestants and the Catholics in Northern Ireland were white. And yet the Catholics, with their legacy of discrimination, grade out about 15 points lower on I.Q. tests. There are many similar examples."
https://www.nytimes.com/1994/10/26/opinion/in-america-throwi...
I am reminded of something my very accomplished PI (in the field of earth system science) confided privately to me once... "Purely statistical arguments," she said, "are mostly bullshit..."
anonymoushn|3 years ago
It seems like you're roughly the only person who thinks this.