Read our COVID-19 research and news.

Flaws in studies about bluetit plumage motivated a push for more rigor in ecology.


Psychology’s replication crisis inspires ecologists to push for more reliable research

Ecologists love to study blue tits. The birds readily nest in boxes in the wild and have striking plumage that seems ideal for testing ideas about the evolutionary point of the ornamentation. Dozens of studies have reported that male coloring is substantially different from that of females, that females choose mates based on differences in that coloring, and that male plumage is a signal of mate quality.

But Tim Parker, an ecologist at Whitman College, wasn’t so sure. In a 2013 meta-analysis of 48 studies on blue tit plumage, Parker found many researchers had cherry-picked the strongest findings from data they had sliced and diced. They had worked backward from results to form hypotheses that fit the data. And reams of boring, negative results were missing from the published picture. There was no reason to think these problems were limited to blue tits, Parker says: “I just became convinced that there was a lot of unreliable stuff out there.”

Parker soon found an ally in Shinichi Nakagawa, an ecologist at the University of New South Wales with similar concerns. “It’s an existential crisis for us,” Nakagawa says. The two began to publish on the issue and gathered more collaborators. That has culminated with the launch last week of the Society for Open, Reliable, and Transparent Ecology and Evolutionary Biology (SORTEE), dedicated to connecting ecologists who want to add rigor to their field. SORTEE draws inspiration from the Society for the Improvement of Psychological Science (SIPS)—another discipline that has wrestled with reliability. SORTEE plans to host satellite events at ecology conferences and, eventually, its own meetings.

Although SORTEE’s agenda will be set by its members, Parker says that, like SIPS, it could offer statistics training, build collaborations, and support metaresearch on the health of the discipline. Yolanda Wiersma, a landscape ecologist at the Memorial University of Newfoundland who is not involved with SORTEE, is eager to see whether the society makes a difference. Research credibility is something “we haven’t wrapped our heads around completely as ecologists,” she says.

Ecology suffers from many of the same underlying problems as psychology. Surveys of the ecology literature have found that small sample sizes are common, often driven by high cost or limited access to a species or other study system. In landscape ecology, each landscape is unique, meaning the sample size is one, Wiersma says. “There is one Yellowstone park,” she says. “There’s one Lake District.” Small samples lead to erratic results that sometimes miss the effects researchers are looking for and other times hit on noise that looks like a real signal.

Worsening those problems are “questionable research practices,” says Fiona Fidler, a metascientist at the University of Melbourne. In a 2018 study published in PLOS ONE, Parker, Fidler, and colleagues reported on a survey of more than 800 ecologists and evolutionary biologists. About half of the respondents said they sometimes presented unexpected findings as if they confirmed a hypothesis they’d had all along, and about two-thirds said they sometimes reported only significant results, leaving out negative ones. Together, these forces mean a literature overflowing with potentially dubious results, Parker says. It’s a “house of cards.”

But unlike psychology, in which researchers have tried to replicate famous studies and failed in about half the cases, ecology has no smoking gun. A 2019 PeerJ study found only 11 replication studies among nearly 40,000 ecology and evolution biology papers—and only four of these 11 studies managed to replicate the original finding. It’s hard to replicate ecology studies, Parker says, because it often entails expensive and difficult data gathering in remote places or over long time frames. And ecosystems are so complex that any number of variables could affect the outcome of a repeat experiment—like the age of the organisms in the study, the temperatures at the time, or the presence or absence of pollutants. “No man can step into the same river twice because it’s not the same man and it’s not the same river,” says Phillip Williamson, an ecologist at the University of East Anglia who has criticized a high-profile effort to replicate ocean acidification research.

Yet Williamson doesn’t think ecology as a whole is at risk just because some experiments fail to replicate. “Biology isn’t physics,” he says. “I think that the consensus of science gets there eventually.” Parker takes a harder line. “If we don’t expect anything to replicate, why do we bother doing any of this?” he asks.

Even before they set up SORTEE, Parker and his corevolutionaries were pushing for change. They worked with journal editors to create checklists for details that papers should include—like whether researchers were blinded to the conditions of different subject groups. They’ve also set up a preprint server that Nakagawa hopes will help preserve results that never make it into journals. Julia Jones, a conservation scientist at Bangor University who is not involved with SORTEE, is advocating for preregistration, which forces a researcher to commit to a data collection plan and hypothesis before the study begins. Some journals offer registered reports—peer-reviewed preregistrations with a commitment to publish the results, however dull or dazzling. Preregistration isn’t always possible, because the vagaries of fieldwork often force researchers to change plans. But she says it can help scientists avoid the “siren song” of looking for a clean story in messy data.

In April, Jones and her colleagues published the first registered report for the journal Conservation Biology. She analyzed extra data from a randomized controlled trial in Bolivia’s highlands that had already found that paying farmers to keep their cattle out of rivers did not improve water quality. Jones found other interesting behavior changes—farmers kept their cattle on their farms rather than roaming the forests—but many results were statistically insignificant. In a normal review process, “we would have been forced to cherry-pick and tell a much simpler story,” she says.

Others are working to address the sample size problems, by gathering massive amounts of data using consistent methods. They hope the data sets will make it easier to see which findings apply beyond a single ecosystem. The U.S. National Ecological Observatory Network (NEON), a continentwide program of more than 100 heavily instrumented field sites, became fully operational in 2019, and the first studies drawing on its data are now underway.

The Nutrient Network (NutNet), cofounded by University of Minnesota, Twin Cities, ecologist Elizabeth Borer, also pulls in large amounts of standardized data, to explore how changes in nutrients and herbivores affect grassland plant diversity. Rather than building infrastructure like NEON, NutNet gets research teams around the world to perform the same experiments—in return for access to a huge data set. Unifying experiments is hard, Borer says. For instance, the team discovered that the fertilizer brand Micromax had slightly different micronutrient mixes on different continents, forcing researchers to import or mix their own.

Borer, Wiersma, and Jones are all sympathetic to SORTEE’s aims—and curious to see whether it takes off. Like the systems they study, ecologists can be fragmented, and developing sound research principles sometimes seems impossible, Wiersma says. “But I think maybe we could,” she says. “We just need to try harder.”