Read our COVID-19 research and news.

Old Data Play Hard to Get, Study Finds

The older the raw data, the harder it is to get your hands on. That’s the perhaps-not-unsurprising message of a new study by a group of ecologists and evolutionary biologists, who set out to track down the authors of 516 papers published between 2 and 22 years ago.

Evolutionary biologist Timothy Vines, of the University of British Columbia, Vancouver, in Canada, got the idea for the project after finishing up a paper late last year about how archiving policies at journals affected the availability of data. Vines began wondering about a broader question: How fast do data (or the people generating it) disappear?

Vines and his colleagues focused on a type of data collection that hasn’t changed all that much, certain types of morphological studies of plants and animals. They focused on 516 papers published after 1990, examining only those that appeared in odd-numbered years to make their list more manageable. They searched for author e-mail addresses online.

In one sense, it was tough to gather data regardless of when the paper was published. In 167 papers published before 2000, 38% had no working author e-mail; for the 349 papers published after 2000, the number dropped to 19%. For papers where an e-mail apparently got through, Vines and his colleagues received a response about half the time, regardless of when the paper was published.

Vines suspects that some e-mails, particularly on older papers, didn’t go through. Authors of older papers were much more likely to admit that their data had been lost. Statistical analysis of results suggested that for every extra year a paper had been in circulation, the odds that its data were still around declined by 17%. In only two cases out of 26 from 1991 did Vines and his colleagues determine that data still existed; the number rose steadily to nearly 40% by 2011 (and would likely have been much higher if more authors had responded to their e-mails).

Vines notes that the study, published today in Current Biology, has its limitations. Many authors might simply have ignored the e-mails requesting data. “If we had told them, ‘Your research funding will stop right now if you don’t give us your data,’ clearly we would have had a higher response rate,” he admits. Still, there’s no doubt that data are disappearing, whether because researchers become difficult to find or because, as Vines also found, older data are stored using obsolete technology such as on floppy disks.

“Everyone sort of thinks this is happening and quietly acknowledges it, but I think it’s important to drag it into the light,” Vines says. Some data sets, such as fieldwork in ecology, are “irreplaceable,” he says. Some are costly to redo. Finally, “if your research is paid for by public money, in some sense the data doesn’t belong to the authors,” Vines argues. “It belongs to the people who paid for it.”