Q&A: David Altshuler on How to Share Millions of Human Genomes

David Altshuler

Broad Institute

It's a comment made over and over by geneticists: To fully understand the role of human genetic variation and its role in disease, researchers need to pool DNA and clinical data from millions of people. Earlier this week, more than 70 research, health care, and patient advocacy organizations, including big players such as the U.S. National Institutes of Health and the United Kingdom's Wellcome Trust Sanger Institute, announced a plan to do just that (see news stories). Their proposed "global alliance," as they call it for now in a white paper, aims to develop standards—analogous to protocols for building Internet web pages—that will enable researchers around the world in fields from cancer to rare diseases to securely share and study patients' genome sequences and clinical information.

ScienceInsider spoke with geneticist David Altshuler of the Broad Institute in Cambridge, Massachusetts, who has led planning for the alliance, about its aims in this edited transcript:

Q: Why is this global alliance needed?

D.A.: A few years ago, there were a handful of genomes and today tens of thousands have been sequenced, or the exomes at least. In the coming years, millions will be sequenced. And we lack the ability, and will lack the ability for a long time, to look at a change in DNA that's observed in someone, a T here instead of an A, and predict what that would mean either biologically or clinically. We have to be in a position to compare genomes and clinical data if we want to learn and if we want to help people, like give them accurate predictions or learn the biology of a disease. The scale of the problem is that it's going to take millions of genomes. Even in a given disease there are often many different genes that can play a role, and there are many, many different diseases.

Q: Why aren't genomes already being shared?

D.A.: For a number of fundamental reasons. The first is that up until very recently, the general mindset has been that studies not just of genetics but clinical studies in general are often done by one investigator, one institution. They've been done disease by disease. It just wasn't anticipated that you would need to look across in this way.

Related to that is that there are no technical standards today for how you would manage and exchange genomic and clinical information of this sort. And there are huge issues of ethics, privacy, and regulations that are very, very important and that need to be developed in concert with the technical standards.

Q: How will this work—will you essentially merge different databases that contain large numbers of genomes?

D.A.: The way these things have been talked about in the past is the idea of building a big database. That's not what this is about. There are big ones, the UK Biobank, the Kaiser Permanente biobank. They will exist and they're very valuable. But we're inspired by the example of the Internet and the World Wide Web and also the Human Genome Project, where different parties working together created a networked ecosystem in which innovation occurred much more rapidly. The idea is to focus not on the creation of individual data sets but to focus on the standards and shared principles and ethics that would make it possible for many people to build things that would be individually innovative and yet collectively could learn from each other. That's what this is really about.

Q: The goals in the white paper are broad and ambitious—not only creating new data standards, but harmonizing regulations and developing new types of informed consent. Won't these things take a while?

D.A.: There's no doubt that it is an ambitious proposal. We hope we didn't overreach but rather that we tried to tackle the problem at the level required. The next steps are to first actually form the alliance later in 2013. In parallel, working groups will begin to develop thinking around technical standards, around ethics, et cetera.

We're envisioning a set of standards that would make it possible for like-minded people to exchange information and a menu of choices for ethics and privacy. One thing the group is very committed to is the principle that individuals should control the use of their own information. There are a lot of implications of that to make that real. So yes, we see there's years of work ahead. The concrete things will be create the organization, the working groups, and within a year have initial standards for certain aspects of this that are being taken up and will hopefully create possible momentum as we see the benefits.

Q: Have any of the big biobank projects, such as Kaiser Permanente and Vanderbilt, declined to participate?

D.A.: They've not turned us down. There are only so many organizations that have thus far been part of the discussion. And so whether or not those organizations choose to participate will be up to them. I'm not aware of any organization that said no.

Q: But will some not be able to participate because of consent agreements with patients?

D.A.: No one knows the answer because there are no standards yet. We're going to try to come together and work together and develop a menu of options and technical standards to implement them so that people can make decisions.

If it turns out there's a set of data that, because of permissions, can only be used for certain purposes, then that's exactly what should happen. There will be data that can't be shared or can only be shared in limited ways.