Chasing Down the Data You Need

Elizabeth Wolkovich

It's the time of year when, in much of the Northern Hemisphere, the earth explodes in a profusion of blossoms and new growth. In 2010, Elizabeth "Lizzie" Wolkovich, who at the time was a postdoctoral fellow at the University of California, San Diego, wanted to know how the timing of this annual event—the blooming of flowers and other signs of spring—is changing. To complete her research, she would need to gather data from lots of different sources—collected by many different researchers—to determine whether climate change was having an effect on flower blooming.

There isn't an ecology database where she could find the data she needed all in one place. So Wolkovich, who is now an assistant professor at Harvard University, went old school. She searched through thousands of papers that analyzed the timing of blooming over long periods of time. "I sent hundreds of e-mails to researchers, asking to use their data. Most people were fantastic. A few individuals wanted authorship in exchange for their data, and there were a few data sets that we just could not get," Wolkovich says.

"I sent hundreds of e-mails to researchers, asking to use their data. Most people were fantastic. A few individuals wanted authorship in exchange for their data, and there were a few data sets that we just could not get." —Elizabeth Wolkovich

She particularly wanted to use the data of agricultural researcher Melvin McCarty, who had collected several decades of plant data in Nevada. McCarty had died, so Wolkovich reached out to his colleagues to see whether any of his data was still available. It wasn’t.

"I think [McCarty's] data would have added so much insight to my project, but it was just lost. Especially in climate research, this old data is especially valuable because it tells us what the planet was like before. Now that things are changing, we can’t recollect data and expect it to be the same," Wolkovich says.

Lately data sharing has been touted as a next big thing in science, and scientists like Wolkovich believe that open science will indeed benefit scientific progress. Meanwhile, Wolkovich and others are left to contend with the difficult issue of trying to access other scientists’ data.

Sometimes it's easy. Often it's difficult. More often than we might wish, it's impossible. Still, although no action can guarantee that researchers will share their data, there are ways to improve your odds.

Jonathan Petters

Jonathan Petters

Courtesy of Jonathan Petters

Courtesy of Jonathan Petters
Jonathan Petters

The first step, says Jonathan Petters, a data manager at Johns Hopkins University in Baltimore, Maryland, is to check all the open access databases: Dryad, figshare, a National Oceanic and Atmospheric Administration/NASA server, or even an institutional repository. "It can save you the bother of asking to use the data," Petters says.

Although these sites contain lots of data, they might not contain the data you need. In that case, go straight to the source. Personal, Petters says, is always better. Whether it's a conversation at a conference, a discussion over coffee, a phone call, or even an e-mail—the more personal and specific you can be, the better.

Gradually raise the pressure

Mark Parsons tells scientists requesting data to be prepared to disclose exactly how and why you intend to use it. As secretary general of the Research Data Alliance and former data manager of the International Polar Year, Parsons knows firsthand the worries of data misuse by people who deny that climate change is occurring. And it’s not just climate scientists. "Lots of researchers worry that their data may be somehow inadvertently misused or misrepresented," Parsons says.

Spelling out exactly how you intend to use the data and why you need this particular data set can help dispel fears that the researcher may have. It can also help clarify the experimental process to make it go more quickly and smoothly, Parsons says. Introducing yourself scientifically, either with a CV or other display of your work background, may also help reassure reticent scientists.

If researchers agree to share their data but they haven’t made it publicly available, ask them to do so, Petters suggests. "With some effort from the researcher who generated the data, many, many other researchers can get access, and they won’t have to field individual requests for it," Petters says. Scientists may also appreciate offers to help with this endeavor as a token of appreciation for sharing their data with you.

Still, scientists are notoriously busy and may not reply to an e-mail, either because they didn’t see it, aren’t interested in sharing, or just don't want to deal with the extra work. Carly Strasser, a data curator at the California Digital Library at the University of California in Oakland, recommends slowly escalating the intensity of your requests. If the initial approach doesn't yield results, ask co-authors if they can nudge the principal investigator to share. An e-mail from an established collaborator is more likely to get read and acted on than a request from a total stranger.

If a private e-mail doesn’t work, Strasser says, try asking on social media or in other public spaces. "There is slightly more accountability but it's also potentially more aggressive," Strasser says.

A tactic like this has the potential to backfire, so tread carefully. But it may strengthen your case by enlisting others. "Other researchers may see the request and express their own interest, which increases the potential value of data sharing," Petters says.

Carly Strasser

Carly Strasser

Courtesy of Carly Strasser

Courtesy of Carly Strasser
Carly Strasser

If pressure isn't working, try some positive spin. Being the go-to source for data on a particular subject can raise a scientist’s profile.

If all these approaches fail, turn up the heat, Strasser advises. "Send a strongly worded e-mail that cites the requirements from the relevant funder to share data upon request. This works for scientists getting their funding from NSF [the National Science Foundation], NIH [the National Institutes of Health], and other federal agencies," Strasser says.

If the data still isn't forthcoming, Strasser recommends one last measure: Contact the scientist’s current governmental program officer. "This works particularly well with NSF-funded researchers. This might be considered a harsh step, but if you've tried other means, you have to rely on their need for future funding to get at the data," Strasser says.

Data's half-life

When it comes to requesting data, time is not on your side. Sometimes the data gets lost or misplaced. Other times it exists in a format that can’t be read with existing technologies. A 2013 Current Biology study by the University of British Columbia's Tim Vines et al. found that the availability of data decreases dramatically with the amount of time since it was published. Vines and colleagues requested data from 516 experiments published 2 to 22 years before the study. They found that the chances of data still being available for sharing decrease by 17% each year. In addition, their ability to find a correct corresponding e-mail address decreased by 7% each year.

Wolkovich had to complete her research without McCarty’s data, even though it would have added much value to her work. "It just wasn’t there anymore," she says. That's too bad, because she and other climate scientists rely on older data to compare their more modern findings.

Should data-sharers be co-authors?

As Wolkovich found, even the simple step of asking to use another researcher’s data can bring up thorny issues. Before taking any other step, she says, have a plan for how you intend to deal with any questions or contingencies that may come up.

When she first sent out hundreds of queries to researchers around the world, Wolkovich got enthusiastic responses from many scientists; they handed over their data for her to use, no strings attached. Other scientists, however, wanted to be co-authors on any papers that Wolkovich authored using their data.

"There’s nothing wrong with that," Wolkovich acknowledged. She recognizes that they went to a lot of effort, sometimes over many decades, to get their results. "But you shouldn’t give authorship to some scientists and not others just because one group asked and the other didn’t," she concluded.

Wolkovich had decided beforehand that the amount of data she needed was so large that she couldn’t realistically offer authorship to all the scientists. She would give credit where credit was due, but she was unable to give authorship, and she told the scientists that.

Some decided to let her use the results anyway. Others declined. Wolkovich was aware that this might happen and was prepared to move forward without the data. "You have to decide going in how you’re going to handle these types of situations, otherwise things can get unfair and out of control very quickly," Wolkovich says.

Still, Petters says, it’s important to acknowledge, both to you and the data sharer, that incentives to share data aren’t that strong. Adding incentives—possibly including authorship, notes in the acknowledgements section, or a newer option of citing the data directly—may help encourage reluctant and busy scientists to share their work. If you attend a conference together, take them to dinner or share a few drinks.

"Buying someone a beer never hurts," Strasser laughs.

Top Image: Elizabeth Wolkovich. Courtesy of Elizabeth Wolkovich

Search Jobs

Enter keywords, locations or job types to start searching for your new science career.

Top articles in Careers

A 3D plot from a model of the Ebola risk faced at different West African regions over time.
Dancing sneakers on pavement
siderailarticle x promo

Follow Science Careers