In 2011, a striking psychology paper made a splash across social media, news, and academia: People used the internet as a form of “external” memory, the study said, relying on it for information rather than recalling facts themselves. In 2018, a key finding from that paper failed to replicate when a team of psychologists put it and 20 other high-profile social science studies to the test.
But the original paper has been cited 1417 times—with more than 400 of those citations coming after the 2018 replication project. That’s far more, on average, than the papers from the project that did replicate. Now, a new study shores up the popularity of unreliable studies: Social science papers that failed to replicate racked up 153 more citations, on average, than papers that replicated successfully.
This latest result is “pretty damning,” says University of Maryland, College Park, cognitive scientist Michael Dougherty, who was not involved with the research. “Citation counts have long been treated as a proxy for research quality,” he says, so the finding that less reliable research is cited more points to a “fundamental problem” with how such work is evaluated.
University of California, San Diego, economists Marta Serra-Garcia and Uri Gneezy were interested in whether catchy research ideas would get more attention than mundane ones, even if they were less likely to be true. So they gathered data on 80 papers from three different projects that had tried to replicate important social science findings, with varying levels of success.
Citation counts on Google Scholar were significantly higher for the papers that failed to replicate, they report today in Science Advances, with an average boost of 16 extra citations per year. That’s a big number, Serra-Garcia and Gneezy say—papers in high-impact journals in the same time period amassed a total of about 40 citations per year on average.
And when the researchers examined citations in papers published after the landmark replication projects, they found that the papers rarely acknowledged the failure to replicate, mentioning it only 12% of the time.
A failed replication doesn’t necessarily mean the original finding was false, Serra-Garcia points out. Changes in methods and evolving habits among participants—like changing patterns of internet use—may explain why an old result might not hold up. But she adds that her findings point to a fundamental tension in research: Scientists want their work to be accurate, but they also want to publish results that are attention grabbing. It might be that peer reviewers lower their bar for evidence when the results are particularly surprising or exciting, she says, which could mean striking results and weaker evidence often go hand in hand.
The guideline that “extraordinary claims require extraordinary evidence” seems to soften when it comes to publication decisions, agrees Massey University computational biologist Thomas Pfeiffer, who studies replication issues, but was not involved with this work. That points to the need for extra safeguards to bolster the credibility of published work, he says—like a higher threshold for what counts as good evidence, and more effort to focus on strong research questions and methods, rather than flashy findings.
“The finding is catnip for [research] culture change advocates like me,” says Brian Nosek, a psychologist at the University of Virginia who has spearheaded a number of replication efforts and was a co-author on two of the three replication projects that Serra-Garcia and Gneezy drew from. But before taking it too seriously, it’s worth seeing whether this finding itself can be replicated using different samples of papers, he says.
The result falls in line with previous studies that suggest popular research is less reliable. A 2011 study in Infection and Immunity, for example, found that high-impact journals have higher retraction rates than lower impact ones. And Dougherty’s research—currently an unreviewed preprint—has found that more highly cited papers were based on weaker data, he says. But a 2020 paper in the Proceedings of the National Academy of Sciences that looked at a different sample of papers found no relationship between citation and replication. That suggests the sample of papers could really matter, Pfeiffer says—for instance, the effect could be particularly strong in high-impact journals.
Nosek adds that stronger but less sensational papers may still accrue more citations over the long haul, if the popularity contest of striking results burns out: “We’ve all seen enough teen movies to know that the popular kid loses in the end to the brainy geek. Maybe scientific findings operate in the same way: Credible ones don’t get noticed as much, but they do persist and win in the end.”