Read our COVID-19 research and news.

DNA code

A project called hopes to entice huge numbers of people to share their genetic test results.


Q&A: Crowdsourced personal genomes database slowly gains momentum

Computational geneticist Yaniv Erlich is known for attention-grabbing studies that harness big data. In 2013, his team showed they could identify some people in supposedly anonymized DNA databases by combining their data with searches in public databases. His team later linked genealogical information from 13 million people into a single family tree. And last year, Erlich and co-worker Joe Pickrell at the New York Genome Center and Columbia University made another splash by inviting people who have had their DNA tested by consumer genetics companies such as 23andMe and to share their DNA reports with their group and others for research., Erlich suggested, could potentially tap into the genetic data of up to 3 million people who have already sent off a saliva sample for DNA testing. And unlike these companies, can make consenting participants’ individual information—including, eventually, health data —available to a broad swath of researchers.

As of this month, has enlisted 32,000 participants. Erlich, who is giving an update on the effort this week at the annual meeting of the American Society of Human Genetics in Vancouver, Canada, told ScienceInsider that although it's far short of even 1 million, he feels the project is on track and ready to move into new territory, such as working with disease advocacy groups.

Yaniv Erlich

Yaniv Erlich

Timothy Lee/Columbia Engineering

Q: Is 32,000 participants where hoped to be in a year?

A: It's phenomenal that we got so many people. Compared to other projects that try to crowdsource DNA or materials from people, I think we have achieved quite a lot. It takes time to build momentum.

Q: How many participants do you need to do research with

A: It's already useful for research. We have a bioRxiv paper right now in which we used MinION sequencing [a pocket-sized sequencer] to see if we can identify people [using their known DNA profile] very rapidly. We used as a way to scan a large cohort and see if we can identify specific individuals like myself or Joe.

In another bioRxiv paper, a fantastic 16-year-old intern in my group put together a tool, called DNA Compass. This was directly inspired by’s user community. They want to learn more about their genome although they’re not geneticists. With DNA Compass, that you can upload your genome data to this website, then you can search for individual SNPs [single-nucleotide polymorphisms] in your genome or for specific traits—let's say hair color or skin pigmentation.

Q: What’s next for

A: Every time we launch a new feature, we get literally thousands more people. We will get more and more as [] gains the confidence of the community and people see there are features that they love.

We are also working now with a patient advocacy group, the National Breast Cancer Coalition [NBCC], to test a questionnaire about breast cancer recurrence. NBCC will amplify the presence of our tool to survivors of breast cancer, to people in a family with breast cancer who took one of these genetic tests. 

Our more ambitious plan is to see-only for people that really want that—if we can also crowdsource Facebook profiles of people. Getting genotypes is in a way a solved problem. Getting phenotypes [traits and characteristics of an individual] is much harder. The thing about Facebook is there are growing lines of research that show you can derive phenotypes from digital interactions. A great PNAS [Proceedings of the National Academy of Sciences] paper showed that just based on your likes, they [researchers] can derive your five big personality traits.