Big research collaborations have become common—think Human Genome Project, Mars rovers, the new BRAIN Initiative—but they are almost unknown in psychology. Most psychological experiments are carried out by a single lab group, often just a few researchers. But several collaborations that span dozens of psychology laboratories around the world have recently formed. Their goal is nothing short of testing the reproducibility of psychological science. The first significant result from one of those alliances was released this week, and psychologists are breathing a sigh of relief that their field came through with relatively minor blemishes—10 of 13 experimental results were replicated.
Reproducibility is a mantra in science. For most types of research, if an experimental result can't be reproduced by another lab, then its credibility is undermined. Fail to reproduce in multiple labs and the original result is dismissed. Testing the reproducibility of experiments is crucial for cleaning out scientific errors, flukes, and fraud. But science doesn't run as efficient a cleaning service as it could. Researchers are given almost no professional incentive to repeat the work of others, let alone report failures to repeat their own experiments.
Now, motivated by several recent high-profile frauds and an overall concern that many of their field’s results aren’t trustworthy, some experimental psychologists are doing an audit. The one announced this week started with a trio: Brian Nosek at the University of Virginia in Charlottesville, Kate Ratliff at the University of Florida in Gainesville, and Ratliff's Ph.D. student Rick Klein. Nosek has been at the forefront of efforts to clean up his field—he and more than 175 collaborators are repeating a random sample of the hundreds of studies published in 2008 in three major psychology journals—and he and Ratliff are both part of Project Implicit, a long-running collaboration that also provides free software for running behavioral experiments with standardized methods. Nosek wanted to use the software to see just how reproducible classic psychological experiments are. "I asked [Klein and Ratliff] if they would be interested in trying to scale up this idea and recruit other laboratories to get involved." They agreed.
So the team made their intentions public by preregistering their study and submitting the idea to a special issue of Social Psychology focused on experimental replication slated to come out in spring 2014. (Nosek is an editor of that issue, so his submission was handled by his co-editor.) ScienceInsider interviewed Nosek earlier this year about the project and his ambitions to make scientific research more open and accountable.
"At that point we didn't really know how many other labs would get involved," Nosek tells ScienceInsider in an e-mail. "But when we sent out invitations for contributors in February and then again in July … a lot of colleagues jumped aboard." Ultimately the partnership, which dubbed itself the Many Labs Replication Project, consisted of 36 laboratories—25 based in the United States and 11 in other countries. The experiments, all of which involved short interactive tasks on computers, were run in a total of seven languages. "Rick Klein played a heroic role in coordinating all of the teams," Nosek says. The group chose a set of 13 published experiments to replicate. Some are classics, such as one that revealed the anchoring effect, a bias in how people use information to make decisions. Others were published more recently, such as flag priming, a result that attracted significant media attention. The team ran these experiments on a total of 6344 subjects. Not only was each experiment tested in different laboratories around the world, but nine of the collaborators also conducted the experiments over the Internet rather than live in the lab. So if an experiment failed to replicated, it would be possible to tease out whether the failure was due to the population being tested, the testing environment, or just the experiment itself.
The results are good news for psychology. Ten of the 13 experiments replicated, and for five of them the statistical effect was stronger than originally observed. The three that failed—two dramatically and one marginally—were all published within the past 3 years. The flag priming experiment was among the failures. And it hardly mattered whether the experiments were performed live in a lab versus online, or with American subjects or international ones. The three failures to replicate seem to be due to the experiments themselves: The effects may just not be real. The entire study, from the original proposal to the data, is now available online.
For Nosek, the specific results are less important than the collaboration itself. "Crowd-sourcing science can be efficient and effective," he says. "We managed to complete data collection from all of the sites within 4 months. Total cost? $2000 for project coordination, and then each laboratory donated their time and resources for their data collections." He hopes that psychologists will be inspired to follow the project’s lead.
Reactions from psychologists not directly involved in the collaboration have been largely positive. "I consider it a heroic effort that is worth recognition and praise," says Anthony Greenwald, a psychologist at the University of Washington, Seattle, but "it's hard to say that this model will be widely adopted. It's quite an effortful process. Many may find the rewards insufficient for the effort. And it remains to be seen whether journals are receptive to publishing articles like this more than occasionally." Several researchers contacted by ScienceInsider echoed that doubt. "Hopefully we'll see a few more multilab studies like this from time to time," says Hal Pashler, a psychologist at the University of California, San Diego, "but it's not going to be possible to do this on a routine basis."
But others see this as the future of research. The results “pave the way for replications to enter the new-normal way of doing research," says Uri Simonsohn, a psychologist at the University of Pennsylvania. "If a new original study is worth publishing, its replication is worth running."