Read our COVID-19 research and news.

Who's Looking at Your Data?

someone looking through binoculars

These days, it is commonplace for funding agencies to require their funded scientists to agree to share data collected during the term of the grant. Journals, too, are requiring authors to submit primary data with manuscript submissions. For the most part, such policies are uncontroversial; there are numerous benefits to sharing data and many are widely recognized by researchers. But one issue that has so far been overlooked is the negative impact that data sharing can have on early-career researchers.

In the United Kingdom, all the major government funding agencies require the sharing of data. The various agencies allow an embargo period that ranges from 6 months to 3 years, during which scientists maintain exclusive access to their primary data. At some agencies (e.g. the Natural Environment Research Council [NERC]), the possible 1-year embargo period would begin when the experiments were completed and not when the grant ends. That may not give researchers much time to fully exploit the data's potential before it has to be shared with competing labs.

If exclusive access to the data could be guaranteed for a few more months, our young researcher might be able to pull together the various threads of the project and publish at least one high-profile publication, which might be enough to secure tenure and career.

The major fear is that shared data could be "scooped, poached or misused." Getting scooped by your own data would be annoying no matter your career stage, but the impact would be especially large for an early-career researcher.

How sharing impacts early-career scientists

Consider a hypothetical situation: A young, tenure-track researcher approaches the end of their first research grant. The tenure decision lies just over the horizon. The funded project is just starting to gel. High-quality data has been collected, but—due to open questions and perhaps the researcher's inexperience—only a handful of publications have emerged so far. The grant ends, and—these being difficult budgetary times—the renewal proposal isn't funded.

If exclusive access to the data could be guaranteed for a few more months, our young researcher might be able to pull together the various threads of the project and publish at least one high-profile publication, which might be enough to secure tenure and career. Instead, the data must be made public right away, just as the money runs out.

So instead—in our hypothetical scenario—a better-funded competing lab, led by an experienced researcher who has long taken an interest in this work, instructs a postdoc to stop other work and focus on excavating the newly shared data. Buried deep within the data, a crucial piece of a scientific puzzle is discovered. A paper is written, submitted to Science, and accepted for publication. For science, this is a positive outcome—a productive collaboration of a sort—but for the promising young researcher it probably ends badly.

The U.S. National Institutes of Health has issued draft requirements for sharing genomic data that would, if implemented, require nothing more than an acknowledgement in published work that utilizes the shared data. Many U.K.-based research councils (e.g. the Biotechnology and Biological Sciences Research Council) state that when data are shared through a third party resource or database, authors need only acknowledge the source. When data are shared directly from the originator, either joint authorship or acknowledgement may be appropriate.

Experienced researchers are far less vulnerable to such policies, since they are likely to be tenured and to be managing several projects at once. Their trainees, though—the graduate students and postdocs who work in their laboratories—are quite vulnerable: Like early-career investigators but to an even greater degree, they are much more likely to have all their eggs in that one scientific basket.

<p>Stephanie Pierce <em>(L)</em> and Steven Portugal</p>

Stephanie Pierce (L) and Steven Portugal

Courtesy of the subjects

Courtesy of the subjects
Stephanie Pierce (L) and Steven Portugal

We encourage funding agencies to address the interests of early-career researchers—including graduate students and postdocs—in setting, updating, and enforcing their data-sharing policies. Questions that need to be answered include:

• Prior to utilizing shared data, should permission be sought from the funding body, the journal where the data were first utilized, or from the scientist who collected the original data?

• Should the data-collector be acknowledged or included as an author? To us, mere acknowledgement does not seem sufficient; surely sophisticated data collection counts as an intellectual contribution to the work and should be rewarded with an offer of authorship.

Coping with sharing

While much depends on the policies of journals, funding agencies, and governments, there are things early-career researchers can do to help themselves. The objective is to maintain a delicate balance between adherence to data-sharing policies and protecting their own career-related goals. Here are some steps that can be taken to preserve your data and future publication plans, while allowing journal editors, granting bodies, and line managers to maintain their equanimity:

(1)  At the start of a project, sit down with your project partners (and your line manager if you have one), and discuss the publications you are hoping to produce from the project and how, as a team, you will manage your data. In science, flexibility is essential, but it helps to have an aspirational framework, and a timeline is helpful or even necessary. The more you work out in advance, the less disagreement there will be moving forward.

(2) Those negotiations should include a plan for what data will be made available, when, and in what format. Strike a balance between the line manager's obligation to meet the data-sharing terms of journals and funding agencies and your own career progression.

(3) If you have any concerns about the provision of raw data accompanying a manuscript submission—e.g., the impact of sharing on future publications—talk to the journal editor. Similarly, if you have uncertainties about the timing of data availability, clarify them with the relevant funding body. Interact with editors and funding officials, and work out an arrangement that works for everyone.

(4) Be clear what you are allowing your data set to be used for and about the length of any embargo you set. It is reasonable (and often permissible) to embargo your data as you write and submit  journal articles. Be realistic about your timeframe; things always take longer than expected. NERC will permit up to a 2-year embargo period to "allow researchers a reasonable amount of time to work‐up their data sets and publish their findings." Longer embargo periods can be granted in exceptional circumstances, so again the message would be, talk to the relevant funding agency.

 (5) Many scientists who share data associated with publications will do so via a data repository such as Dryad. (Also consider the Digital Curation Centre as a resource.) Under normal circumstances, storing data at one of these repositories equates to giving permission for instant access, but a 1-year embargo can be granted in extenuating circumstances. Importantly, data repositories assign doi numbers, which means peers are required to reference them in any resultant publications. Such a system helps to ensure that your work is properly cited, which should help your career progression. At present though, data repositories like Dryad only store information affiliated with a publication; it's still unclear how credit would be given for raw data deposited in an archive.

(6) If you're already independent and face the prospect of being forced to share your hard-earned data, comply with all mandatory policies—but no sooner than you have to. If you have concerns, work with the funding agency. Win the support of agency staff. Seek extensions. Do whatever you can to protect your professional interests without breaking any rules. And, most important, publish those results as soon as possible.

In a way, we regret having to make recommendations like these. Data sharing is good for science. Even the hypothetical scenario outlined above led to an outcome that was good for science—just not for one early-career scientist. Ultimately the problem is one of incentives: Our current system puts too much emphasis on individual achievement and too often penalizes high-minded principle. Yet, scientists who wish to make important contributions must succeed according to the system's current rules. Policymakers and young scientists alike must balance the need for change against the current realities of our scientific culture.