In a provocative paper published this week, researchers say they have figured out a way to link a person's DNA to their anonymous genetic data in a certain kind of public research database. But the National Institutes of Health (NIH), which hosts one of the largest such databases, says it's not taking any new steps to prevent someone from using the method to breach privacy. That contrasts with NIH's response 4 years ago, when a similar study prompted the agency to pull genetic data from its public Web sites.
The issue then involved studies that compare DNA variants called single nucleotide polymorphisms (SNPs) in people with and without an illness to find disease risk markers. NIH had begun posting online pooled SNP results from hundreds of people, thinking privacy would not be breached. But then scientists reported in PloS Genetics that if they had a sample of an individual's DNA, they could link it to that person's SNP results within a public DNA pool. NIH (and the Wellcome Trust) removed data from public sites; NIH now allows only approved researchers to download pooled data from SNP disease studies.
Such access barriers are less common for a different type of genetic data: measures of gene activity derived by analyzing RNA levels in a tissue sample. Because this gene expression data wasn't thought to be traceable to an individual, researchers have routinely deposited RNA results in public databases. One example is NIH's Gene Expression Omnibus (GEO) database, which holds nearly 1000 datasets for gene expression tests on human tissues. Anyone can look up data for individuals who participated in, say, a study on breast cancer or childhood obesity.
Now it seems that this RNA data can be linked to a person's DNA after all. Eric Schadt and colleagues at Mount Sinai School of Medicine in New York City reported this week in Nature Genetics that they have developed a technique for generating a personal SNP profile, or a DNA "bar code," for an individual based on their gene expression results. This means that, in principle, if someone had a DNA sample from a participant in a study stored in GEO, they could devise a SNP barcode, match it to a GEO sample, and look at that participant's biological data.
Despite implications similar to those of the 2008 PloS Genetics paper—a remote but real possibility that research participants could be identified—NIH isn't as concerned this time. In a statement, the agency said that while NIH leaders "will be reviewing the finding" and its implications, "NIH sees no need to modify its data sharing practices at this time." National Human Genome Research Institute spokesperson Larry Thompson explains that the risks seem low because what Schadt's group did requires "a complex statistical tool" and "it's not an easy thing to do." NIH's attitude was different 4 years ago, he says, because it "was the first time," and NIH felt it should "go to the extremes of caution."
Schadt says he didn't expect NIH to impose new limits on data access. His message, he says, was "to highlight that in fact there may be no way to protect privacy" of individual genetic data. Instead of blocking access, Schadt says, NIH should educate people that there's a chance that their data won't remain confidential—and instead rely on "downstream" protections such as genetic antidiscrimination laws.
Attorney Dan Vorhaus, who runs the blog Genomics Law Report, agrees that "the idea that we can promise a complete separation from data and identity is now largely discredited." Vorhaus says that NIH should update its data-sharing policy to require that study volunteers be told that the privacy of their genetic data can't be guaranteed. "Participants should understand the risk and be free to assume that risk if they wish," Vorhaus says.