We know that “Dancing with the Stars” can transform celebrities from clumsy beginners to polished dancers, but did you know that the popular television show also reveals a significant unconscious bias in human judgment?
Kieran O’Connor and Amar Cheema, both professors in the University of Virginia’s McIntire School of Commerce, used the show in a new paper being heralded in the academic world and in magazines like The Economist.
In it, they prove that when judges evaluate something repeatedly over time, they tend to give higher scores – whether they are evaluating a dance competition, an essay or some other feat.
On “Dancing with the Stars,” which pairs celebrities with professional dance partners for weekly performances, O’Connor and Cheema analyzed 5,511 scores from the show’s three judges over 20 seasons – two seasons per year for 10 years. They found that each judge’s average score increased season by season, even after they controlled for the experience and quality of the professional dancers.
According to O’Connor and Cheema, it’s a problem of misattribution.
The more judges evaluate something like a dance, he said, the less onerous the evaluation process itself feels. The process of judging gets easier, faster, more familiar and less mentally taxing. The problem comes when judges attribute that newfound ease to the quality of the performance, rather than to their own growing proficiency as judges.
“We make this unconscious inference that if something feels easier to judge, it must be better,” O’Connor said. “We mistakenly equate how easy something feels to process cognitively with its quality or value.”
The judges on “Dancing with the Stars,” the researchers concluded, were unconsciously giving higher scores over time not necessarily because the dances were better, but because they were more comfortable with the judging process.
To double-check their theory, O’Connor and Cheema systematically ruled out alternative explanations. First, they chose “Dancing with the Stars” because the three core judges – Len Goodman, Carrie Ann Inaba and Bruno Tonioli – had been with the show from its first episode in 2005 and appeared, with few exceptions, in every episode that has followed.
“We were drawn to ‘Dancing with the Stars’ because these three judges were stable across 20 seasons,” O’Connor said. “Anytime you study something like this outside the lab, there are always complications with the richness of data and potential alternative explanations. But this context allowed us to control for a lot of them.”
The judges’ longevity eliminated the risk of inconsistency among different judges and also allowed O’Connor and Cheema to compare the full-time judges’ scores with guest judges’ scores over time. They found no positive trend among guest judges’ scores from season to season.
They also controlled for the professional dance partners’ years of experience on the show, proving that the steady increase in scores was not significantly related to increased proficiency among the dance partners.
Additionally, they looked at professional dancers who had appeared in at least one of the first 10 seasons and one of the last 10, to control for the possibility that the show was attracting higher-quality dancers as time went on. O’Connor also noted that the show’s U.S. Nielsen ratings have declined over time, which would intuitively indicate that the quality of the performances has not improved dramatically.
Then, they tested their theory in other fields and in the laboratory.
“After we found this, it really hit us that this could be true anywhere,” O’Connor said. “Once you hear that judges in all contexts may give more positive evaluations over time, people instantly start coming up with examples.”
Other examples in the paper and its supplementary materials include grading in classes taught for several successive semesters, Amazon product reviews over time, and controlled experiments in which O’Connor and Cheema had participants evaluate randomly selected photographs and short stories over several weeks to rule out other alternative explanations.
Each time, they found that the effect held: the more evaluations someone makes, the more positive evaluations they give.
According to data from a large U.S. university, when professors taught a class several times in succession – the researchers’ data set included 1,854 courses over a 10-year period – the average grade for that course tended to be higher in later courses. The effect did not necessarily produce significant changes for individual students’ grades year to year, but it did fuel an upward trend when courses were offered multiple times.
“Across all studies, the data seemed to suggest a linear pattern,” O’Connor said.
On Amazon, O’Connor and Cheema studied 865 top reviewers, each of whom had between 10 and 100 reviews. The reviewers consistently gave more positive reviews over successive reviews, a pattern that persists even after controlling for product quality.
In the lab, participants were asked to evaluate randomly selected short stories and photographs every day for two weeks. In surveys, they consistently said that the evaluation process was getting easier, but they insisted that their evaluations were not becoming more positive. The numbers, however, proved them wrong. Students consistently mistook the increased ease of evaluation for increased quality and their evaluations did indeed grow more positive over time.
“That told us that there is a disconnect between what we believe about our behavior as judges and what our behavior actually says,” O’Connor said.
That, O’Connor said, might actually be the most important point of the paper. Once we know this bias exists, we will be better equipped to address it in important situations.
“If the goal is fair assessment – and that is an important goal in everything from dance competitions to grades to any other evaluation – highlighting this bias and making it known can help to counteract the bias itself.”