Saturday, June 26, 2010

Re-post: The Dunning-Kruger effect

Note: This post is a re-post of my write-up of the Dunning-Kruger effect. Since I wrote it originally, the link to the article in the journal no longer work, so I've changed the link to another location on the internet.

We've all experienced situations where we've run into people who seemed unable to estimate their own (lack of) skills in a particular area. It might be as extreme as the Creationist who tries to explain science to the scientist, but it can also be people in your day to day life, who overestimates their own skills in one thing or another.

There are two explanations for this. One is the "above-average syndrome", the other is the Dunning-Kruger effect.

The "above-average syndrome" is, simply put, that the average person in a given field will believe themselves to be above average. In other words, more people believe themselves above average than really are. Obviously, only 50% can be above average, but there are perhaps 80% who believes they are.

The Dunning-Kruger effect is related to the above-average syndrome, but it's one explanation of why this syndrome exist (there can be other reasons). The effect is named after Justin Kruger and David Dunning who made a series of experiments, which results they published in the Journal of Personality and Social Psychology in December 1999. The title of the article was Unskilled and Unaware of It: How Difficulties in Recognizing One's Own Incompetence Lead to Inflated Self-Assessments (.pdf), which to my mind is one of the greatest titles I've ever seen on an article.

I highly recommend downloading the article, and reading it. It's fairly straightforward, and you don't need any background in psychology to understand it.

Dunning and Kruger looked at the above-average effect, and formed the hypothesis that it takes skills to evaluate yourself. With that hypothesis in mind, they set out to make a number of experiments to either disprove it, or to support it. Since I'm writing about the effect now, you've probably already figured out that their experiments supported their hypothesis.

Their experiments were fairly straightforward:
- Ask people to do some tests.
- Get people to evaluate how well they did compared to others.

At later tests they also included the following:
- Show people how others did.
- Get people to re-evaluate their level compared to others.

First they put people through a number of tests in different areas, and afterward asked them to evaluate how well they would do compared to others based on how they perceived their skill level. The results can be seen in the following figure from the paper.

Dunning-Kruger results 1

As can be clearly seen, while the trend line of the perceived skill level was correct, everyone who took the tests believed that they were in the 3rd quartile. In other words, while the people in the 1st quartile estimated themselves lower than the people in the 2nd quartile, they vastly overestimated their abilities compared to others (by some 50 percentage points).

As the study says, there are two potential sources for the this wrong estimate, which Dunning and Kruger tried to evaluate.

Finally, we wanted to introduce another objective criterion with which we could compare participants' perceptions. Because percentile ranking is by definition a comparative measure, the miscalibration we saw could have come from either of two sources. In the comparison, participants may have overestimated their own ability (our contention) or may have underestimated the skills of their peers.


The way they tested this, was by not only asking people to compare themselves with other people, but also by asking them to tell how many questions they thought they had answered correctly. If they were correct in the number of questions they had answered correctly, this would mean that they had underestimated the skills of their peers rather than overestimated their own skills. This was, unfortunately, not the case. It turned out that people were actually pretty good at estimating how many correct answers would place them in their position, but were not able to estimate how many answers they would answer correctly.

In other words, it was not a failure of estimating others, it was a failure of estimating themselves (this held true across all quartiles, but was most strikingly among the lowest quartile).

All in all, the test results pretty much supported the hypothesis which Dunning and Kruger had made, but while it demonstrated the inability of people to estimate themselves, it didn't really address whether people would have the skill-set to re-evaluate their ability.

This is also something Dunning and Kruger set out to test.

Participants. Four to six weeks after Phase 1 of Study 3 [a grammar test] was completed, we invited participants from the bottom- (n = 17) and top-quartile (n = 19) back to the laboratory in exchange for extra credit or $5. All agreed and participated.
Procedure. On arriving at the laboratory, participants received a packet of five tests that had been completed by other students in the first phase of Study 3. The tests reflected the range of performances that their peers had achieved in the study (i.e., they had the same mean and standard deviation), a fact we shared with participants. We then asked participants to grade each test by indicating the number of questions they thought each of the five test-takers had answered correctly.

After this, participants were shown their own test again and were asked to re-rate their ability and performance on the test relative to their peers, using the same percentile scales as before. They also re-estimated the number of test questions they had answered correctly.


The results were very interesting, as can be seen in the two figures I'e made based upon the numbers from the article.

Dunning-Kruger lowest quartile

Dunning-Kruger highest quartile

As the figures plainly show, the people in the highest quartile could use the information they gained to adjust their evaluation in the correct direction, though they were still too low. The people in the lowest quartile on the other hand, were unable to properly estimate their own effort, and actually misjudged their score even more afterward.

All in all, the tests supported Dunning and Kruger's hypothesis, and it gives us a better understanding of why people some times are so bad at judging their own skill level.

Having said all that, I should probably mention that later researchers disputes some of the conclusions made by Dunning and Kruger. In Skilled or Unskilled, but Still Unaware of It: How Perceptions Difficulty Drive Miscalibration in Relative Comparisons (.pdf) Burson et al. argues that it's not just unskilled people who can have a hard time evaluating their own skill level.

People are inaccurate judges of how their abilities compare to others’. J. Kruger and D. Dunning (1999, 2002) argued that unskilled performers in particular lack metacognitive insight about their relative performance and disproportionately account for better-than-average effects. The unskilled overestimate their actual percentile of performance, whereas skilled performers more accurately predict theirs.
However, not all tasks show this bias. In a series of 12 tasks across 3 studies, the authors show that on moderately difficult tasks, best and worst performers differ very little in accuracy, and on more difficult tasks, best performers are less accurate than worst performers in their judgments. This pattern suggests that judges at all skill levels are subject to similar degrees of error. The authors propose that a noise-plus-bias model of judgment is sufficient to explain the relation between skill level and accuracy of judgments of relative standing.


In the Burson et al. study, the best quartile underestimated themselves when dealing with hard tasks as the worst quartile overestimated themselves when dealing with easy tasks.

Burson et al

This doesn't necessarily invalidates the Dunning-Kruger effect, but it does tell us that we can't rely on people to correctly evaluate themselves, no matter their skill level.

Labels: , ,

Friday, April 02, 2010

Not even bad science

Once in a while, one comes across a published study which is so bad that one cannot even consider it to be bad science, rather it's so completely wrong, that it has nothing to do with science at all.

I recently came across one such study.

For some strange reason, I had clicked on a homeopathy hashtag in Twitter, showing me the latest tweets about homeopathy. This is not something I recommend, as the stupidity is out in force there. Anyway, one of the most recent tweets at the time referred to a study demonstrating that homeopathy was as efficient as anti-depressants. Unsurprisingly, I took a closer look at the study.

The study is this one: Homeopathic Individualized Q-potencies versus Fluoxetine for Moderate to Severe Depression: Double-blind, Randomized Non-inferiority Trial by U. C. Adler et al. (read it at your own risk)

My last post was triggered by reading the study for reasons I will come into later, but let me just say for now that whoever was in charge of that study was obviously a true believer of homeopathy, and had very little grasps of science and evaluating results.

What makes me say that? Well, let's tackle the claim about the person in charge being a true believer and not understanding science. Passages like this one, should clearly demonstrates why I say that.

Hahnemann's dynamization gained support of physics: thermoluminescence emitted by ‘ultra-high dilutions’ (dynamizations) of lithium chloride and sodium chloride was specific of the salts initially dissolved, despite their dilution beyond the Avogadro number (11).


I think I can safely say that no physicist would agree that Hahnemann's idea of dynamization has gained the support of physics - as a matter of fact, physicists would call Hahnemann's claims pure nonsense.

Don't believe me? Well, this is what Hahnemann had to say on the subject of dynamization:

This remarkable transformation of the properties of natural bodies through the mechanical action of trituration and succussion on their particles (while these particles are diffused in an inert dry or liquid substance) develops the latent dynamic powers previously imperceptible and as it were lying hidden asleep in them. These powers electively affect the vital principle of animal life. This process is called dynamization or potentization (development of medicinal power), and it creates what we call dynamizations or potencies of different degrees.


In other words, this is the claim that diluting the substance makes it more potent.

The claim that physics supports this is based upon one article, Thermoluminescence ofultra-high dilutions of lithium chloride and sodium chloride (.pdf) by Louis Rey, which hasn't been replicated, and which certainly doesn't support Hahnemann's claims about the substance becoming more potent. There are also several people who points out problems with Rey's study.

So, all in all, a good scientist would most certainly not make claims like "Hahnemann's dynamization gained support of physics", since there is only one, non-replicated study which might support this claim - in terms of science, this amounts to an unsupported claim.

Still, even though the people in charge of the study are true believers, and don't really understand science, it doesn't mean that the study can't be useful - after all, if done properly, the results should speak for themselves. So, let's return to the study.

Basically, the study was conducted by assigning patients into two groups - one group which would receive Fluoxetine and a placebo, and one group which would receive the homeopathic remedy and a placebo.

I think you all can see the issue there. Since homeopathy is placebo, what this study is doing, is comparing Fluoxetine to placebo. Of course, the people conducting the study doesn't see it this way, but as I wrote in my last post: Until homeopaths can explain how homeopathy works in terms which doesn't mean that everything we know about chemistry, physics, and physiology is wrong, then we can safely reject their claims.

But let's for the sake of the argument accept that we are comparing two different types of remedies for depression, and look at what the study concludes.

This sample consisted of patients with moderate to severe depression, because their mean MADRS depression scores were close to the 31 score cut-off for moderate and severe depression (28). Initially, 284 subjects were screened, 105 of them met the inclusion criteria, 14 out of them did not attend the first appointment, 91 were randomized and 55 completed the 8-week trial. A detailed flow chart of subject progress through the study is shown in Fig. 1.




So, out of 91 people, only 55 completed the 8-week trial. That's a drop-out rate of ~40%. That's a quite significant number, and would impact the reliability of the study. Let's see how they go into this later.

There were no significant differences between the proportions of excluded and lost for follow-up patients in the two groups (P = 0.99), though there was a trend toward greater treatment interruption for adverse effects in the fluoxetine group, as can be seen in Table 1.




Well, true, there was no significant difference between the proportions, but there were quite different reasons why people were excluded.

As they state, there was "a trend toward greater treatment interruption for adverse effects in the fluoxetine group", hardly surprising since they were given actual medicine, rather than placebo, which has side-effects. What's more surprising is that there were actually 3 people excluded from the placebo homeopathy group for adverse effects. Since there are no active ingredients in the homeopathic remedies, this must either be due to an negative placebo effect (called the nocebo effect), or due to a misdiagnosis of e.g. clinical worsening.

What they did leave out was the fact that there is a significant difference in the clinical worsening in the two groups. Among people receiving the medicine, there was one person excluded because of worsening (approximately 2.3% of the cohort), while among the receivers of homeopathic remedies, there were five who were excluded for this reason (more than 10% of the cohort, 10.4% to be more precise). Again, this is entirely in line with our knowledge of which group is receiving actual medicine, and it certainly is a significant enough difference for it to be taken into consideration when writing about the results, yet this was not done.

The rest of the result section goes on to analyzing the results of the people who made it through the 8 week period, yet doesn't address neither the high drop-out rate nor the differences in the reasons for exclusions.

Going to the discussion section, there are a couple of things which jumps out, particularly this paragraph:

A placebo-arm was not included in the present study because it was not authorized by the National Ethic Council. Although placebo interventions are associated with mean response or remission rates of ~35% (37,38), a placebo effect cannot be ruled out, since the homeopathic Q-potencies were compared with an antidepressant and ‘it is becoming more and more difficult to prove that antidepressants—even well-established antidepressants—actually work better than placebo in clinical trials’ (39). Nevertheless, it also has to be taken into consideration that the antidepressant-placebo difference seems to be smaller in the trials aiming at mild to moderate depression (40,41) and the present sample consisted of patients suffering from moderate to severe depression. Placebo-controlled studies would be recommendable to clarify these findings.


The first part of the paragraph is what caused the ranting in my last post. I presume that the National Ethic Council didn't allow a placebo-arm in the study, because it considered it unethical to submit people suffering from depression to placebo, yet by allowing this homeopathy study to go ahead, it submitted the very same type of people to something which we know is placebo. What the hell is wrong with these people? Are they really so gullible that they don't realize this?

And when reading the rest of the paragraph, it becomes clear that the people who conducted the study is aware that they were comparing homeopathic remedies to something which hasn't been shown to be any better than placebo. In other words, they are comparing one type of placebo, homeopathy, to something which might very well be another type of placebo, or which at the very least seems to have a very limited effect on top of the placebo effect. And we are supposed to be impressed that homeopathy is as effective?

Labels: , , ,

Sunday, May 10, 2009

The Dunning-Kruger effect

We've all experienced situations where we've run into people who seemed unable to estimate their own (lack of) skills in a particular area. It might be as extreme as the Creationist who tries to explain science to the scientist, but it can also be people in your day to day life, who overestimates their own skills in one thing or another.

There are two explanations for this. One is the "above-average syndrome", the other is the Dunning-Kruger effect.

The "above-average syndrome" is, simply put, that the average person in a given field will believe themselves to be above average. In other words, more people believe themselves above average than really are. Obviously, only 50% can be above average, but there are perhaps 80% who believes they are.

The Dunning-Kruger effect is related to the above-average syndrome, but it's one explanation of why this syndrome exist (there can be other reasons). The effect is named after Justin Kruger and David Dunning who made a series of experiments, which results they published in the Journal of Personality and Social Psychology in December 1999. The title of the article was Unskilled and Unaware of It: How Difficulties in Recognizing One's Own Incompetence Lead to Inflated Self-Assessments (.pdf), which to my mind is one of the greatest titles I've ever seen on an article.

I highly recommend downloading the article, and reading it. It's fairly straightforward, and you don't need any background in psychology to understand it.

Dunning and Kruger looked at the above-average effect, and formed the hypothesis that it takes skills to evaluate yourself. With that hypothesis in mind, they set out to make a number of experiments to either disprove it, or to support it. Since I'm writing about the effect now, you've probably already figured out that their experiments supported their hypothesis.

Their experiments were fairly straightforward:
- Ask people to do some tests.
- Get people to evaluate how well they did compared to others.

At later tests they also included the following:
- Show people how others did.
- Get people to re-evaluate their level compared to others.

First they put people through a number of tests in different areas, and afterward asked them to evaluate how well they would do compared to others based on how they perceived their skill level. The results can be seen in the following figure from the paper.

Dunning-Kruger results 1

As can be clearly seen, while the trend line of the perceived skill level was correct, everyone who took the tests believed that they were in the 3rd quartile. In other words, while the people in the 1st quartile estimated themselves lower than the people in the 2nd quartile, they vastly overestimated their abilities compared to others (by some 50 percentage points).

As the study says, there are two potential sources for the this wrong estimate, which Dunning and Kruger tried to evaluate.

Finally, we wanted to introduce another objective criterion with which we could compare participants' perceptions. Because percentile ranking is by definition a comparative measure, the miscalibration we saw could have come from either of two sources. In the comparison, participants may have overestimated their own ability (our contention) or may have underestimated the skills of their peers.


The way they tested this, was by not only asking people to compare themselves with other people, but also by asking them to tell how many questions they thought they had answered correctly. If they were correct in the number of questions they had answered correctly, this would mean that they had underestimated the skills of their peers rather than overestimated their own skills. This was, unfortunately, not the case. It turned out that people were actually pretty good at estimating how many correct answers would place them in their position, but were not able to estimate how many answers they would answer correctly.

In other words, it was not a failure of estimating others, it was a failure of estimating themselves (this held true across all quartiles, but was most strikingly among the lowest quartile).

All in all, the test results pretty much supported the hypothesis which Dunning and Kruger had made, but while it demonstrated the inability of people to estimate themselves, it didn't really address whether people would have the skill-set to re-evaluate their ability.

This is also something Dunning and Kruger set out to test.

Participants. Four to six weeks after Phase 1 of Study 3 [a grammar test] was completed, we invited participants from the bottom- (n = 17) and top-quartile (n = 19) back to the laboratory in exchange for extra credit or $5. All agreed and participated.
Procedure. On arriving at the laboratory, participants received a packet of five tests that had been completed by other students in the first phase of Study 3. The tests reflected the range of performances that their peers had achieved in the study (i.e., they had the same mean and standard deviation), a fact we shared with participants. We then asked participants to grade each test by indicating the number of questions they thought each of the five test-takers had answered correctly.

After this, participants were shown their own test again and were asked to re-rate their ability and performance on the test relative to their peers, using the same percentile scales as before. They also re-estimated the number of test questions they had answered correctly.


The results were very interesting, as can be seen in the two figures I'e made based upon the numbers from the article.

Dunning-Kruger lowest quartile

Dunning-Kruger highest quartile

As the figures plainly show, the people in the highest quartile could use the information they gained to adjust their evaluation in the correct direction, though they were still too low. The people in the lowest quartile on the other hand, were unable to properly estimate their own effort, and actually misjudged their score even more afterward.

All in all, the tests supported Dunning and Kruger's hypothesis, and it gives us a better understanding of why people some times are so bad at judging their own skill level.

Having said all that, I should probably mention that later researchers disputes some of the conclusions made by Dunning and Kruger. In Skilled or Unskilled, but Still Unaware of It: How Perceptions Difficulty Drive Miscalibration in Relative Comparisons (.pdf) Burson et al. argues that it's not just unskilled people who can have a hard time evaluating their own skill level.

People are inaccurate judges of how their abilities compare to others’. J. Kruger and D. Dunning (1999, 2002) argued that unskilled performers in particular lack metacognitive insight about their relative performance and disproportionately account for better-than-average effects. The unskilled overestimate their actual percentile of performance, whereas skilled performers more accurately predict theirs.
However, not all tasks show this bias. In a series of 12 tasks across 3 studies, the authors show that on moderately difficult tasks, best and worst performers differ very little in accuracy, and on more difficult tasks, best performers are less accurate than worst performers in their judgments. This pattern suggests that judges at all skill levels are subject to similar degrees of error. The authors propose that a noise-plus-bias model of judgment is sufficient to explain the relation between skill level and accuracy of judgments of relative standing.


In Burson et al. study, the best quartile underestimated themselves when dealing with hard tasks as the worst quartile overestimated themselves when dealing with easy tasks.

Burson et al

This doesn't necessarily invalidates the Dunning-Kruger effect, but it does tell us that we can't rely on people to correctly evaluate themselves, no matter their skill level.

Labels: , , ,

Saturday, January 31, 2009

How distressed are people by racism?

While surfing the internet, I came across this reporting on some recent studies on how much racism affects us.

We Are Less Disturbed By Racism Than We Predict

Psychologists in Canada and the US suggest that people predict they will feel worse than they actually do after witnessing racial abuse and that while they think or say they would take action, they actually respond with indifference when faced with an act of racism. This is despite the fact that being labelled as a racist has become a powerful stigma in our society today.

Researchers from Departments of Psychology at York University in Toronto, the University of British Columbia, and Yale University in New Haven, Connecticut, performed the study, which is published on 9 January in Science.


So, while people might believe that they will feel bad about racism, and react to it, the truth is that they do neither.

The study was published in science, and an abstract can be found here which links to the full article behind the paywall.

Fortunately, it's also possible to find the article here (.pdf)

The authors of the study raised the paradox that while racism and racists are being viewed more and more negatively, while blacks still face racism regularly.

A recent survey (5) found that 67% of blacks indicated that they often face discrimination and prejudice when applying for a job, and 50% reported that they experienced racism when engaging in such common activities as shopping or dining out. For many blacks, derogatory racial comments are a common occurrence, and almost one-third of whites report encountering anti-black slurs in the workplace (6)


Obviously, the social stigma facing racism holds little deterring effect, and one must ask why that's the case. The authors suggest that the "social deterrents to racism may be weaker than public rhetoric implies", which means that while people think that they will stigmatize racists, they don't really do so in reality.

They set out to test this hypothesis by putting people in a situation where they experienced either no, mild, or strong racism, and later asked people how they would react in a "hypothetical" situation where they experienced the same behavior. Unsurprisingly people said that they would react negatively, but in reality, their behavior didn't reflect this, nor was it reflected later when they had to choose to partner up with either the person uttering the remark, or the subject of the remark.

All in all, it shows that people know intellectually how they are supposed to behave, but that there is a long way left before people actually behaves that way, or as they authors of the study write

In particular, despite current egalitarian cultural norms and apparent good intentions, one reason why racism and discrimination remain so prevalent in society may be that people do not respond to overt acts of racism in the way that they anticipate: They fail to censure others who transgress these egalitarian norms. These findings provide important information on actual responses to racism that can help create personal awareness and inform interventions, thereby helping people to be as egalitarian as they think they will be.


Keep this in mind next time you experience racism. If you don't react, who will?

Labels: , ,

Tuesday, October 14, 2008

Survey for atheists

Jon Lanman, a DPhil student at Oxford University, who I have met a couple of times while he visited Denmark to do field work, is currently doing a survey of atheists.

He would like some more Scandinavian answers, so I thought I'd link to it here from my blog. Don't worry, non-Scandinavians can also answer, Jon would just really like as many answers from Scandinavian atheists as possible.

Anyway, here is what Jon says about the survey.

Hi everyone,

My name is Jon Lanman and I'm a DPhil student at Oxford University studying atheism and humanism. Two of the main goals of my research are to get a better descriptive account of what individual non-theists/humanists think about a variety of topics and also to test a host of hypotheses from psychology and anthropology concerning how different factors of environment and upbringing can affect our beliefs. Towards that end, I have designed an interview/survey for individual non-theists/humanists.

The survey is will ask you about your beliefs and experiences, as well as a variety of background/demographic matters. The survey should take somewhere between 35 minutes and 50 minutes to complete. You can, however, exit the survey and come back to it at a later time if you do not feel like answering the questions all at once.

Here is the link for the survey:

https://www.surveymonkey.com/s.aspx?sm=g0hUnwCEp0EYQMoRMVFK_2bQ_3d_3d


Thanks very much for your participation and feel free to contact me at jonathan.lanman@anthro.ox.ac.uk

Cheers,
Jon


I hope that my readers will help Jon out in his research.

Labels: , ,