Researchers have published what may be the validated largest family tree ever: a genealogy database stretching back 5 centuries that links 13 million people related by blood or marriage. The tree has already led to such insights as the link between genes and longevity and why our ancestors married whom they did. And researchers say that’s just a start.
“This study is an impressive and clever use of crowdsourcing data to address a number of interesting scientific questions,” says geneticist Peter Visscher of the University of Queensland in Brisbane, Australia, who was not involved with the work. The tree’s bigger promise, he and others say, could come if it were linked to health information to explore the role of genetics in diseases.
Computational geneticist Yaniv Erlich of Columbia University says he thought up the project 7 years ago, after he got an email from a distant cousin through a website called Geni.com, where people share their family trees. He emailed the company’s chief technology officer, who gave him his blessing to download the site’s tens of millions of public profiles listing a person’s name, sex, date and place of birth, date of death, and immediate relatives (but no DNA information). Figuring out how to make sense of the data, verify relationships, and fix errors took time—his team presented an early version of the tree at a meeting more than 4 years ago—and they later added more data, giving them a starting point of 86 million profiles.
The final result is a single pedigree connecting 13 million relatives mostly of European descent, dating back 11 generations. It includes, among others, famed population geneticist Sewall Wright and actor Kevin Bacon, Erlich says. (Geni now has 120 million connected profiles and other ancestry sites have large numbers, but the family trees within them have not been validated in the same way.)
Looking across the tree’s death data, the team found expected fluctuations in life span—a drop for young men during the Civil War, World Wars I and II, and a rise in childhood survival in the 1900s. By plotting births on a global map over time, they charted major migration events, such as the Mayflower landing in 1620 in present-day Massachusetts—soon followed by a burst of births in the region—and the 1788 founding of the British penal colony that began Australia’s colonization (see movie).
The tree also yielded a new estimate for how much of our life span is determined by genes: just 16%, compared with an estimate of about 25% from studies of twins in Scandinavia. (Like the tree, the twin studies don’t directly analyze DNA, only life spans and relationships.) The lower heritability figure suggests that longevity has even more to do with environment and behavior than had been thought, the team reports today in Science.
Scandinavia’s peaceful history may give genetics a larger role there, says Kaare Christensen, who heads the Danish Twin Registry in Odense, Denmark. But, he notes that twin studies also lack the power of the new study. The new study’s lower figure of 16% suggests that for researchers hunting for longevity genes, “It might be slightly more difficult than we thought,” he says.
Erlich’s team also explored what he calls “who and where is the love of your life.” In 1700, people typically married a fourth cousin born 10 kilometers away; starting around 1850 they married less genetically related partners. But although experts had thought this shift reflected a growing distance between where partners were born, Erlich’s study found that didn’t explain it. Instead, a cultural factor such as a taboo on marrying a cousin may have arisen around this time and led to less marriage to relatives. “All of this helps us to understand how genes spread in a geographical area,” Erlich says.
These findings only scratch the surface of the family tree’s potential uses, says Erlich, who last year became chief science officer at MyHeritage.com, which owns Geni.com. The Columbia team has begun to mesh the tree with a site called DNA.Land where volunteers share their DNA genotyping data from consumer DNA test services such as 23andMe and MyHeritage.com and fill out health surveys. Erlich’s team is also making the data set (stripped of names) available to other researchers.
Overlaying health data for tens of thousands or hundreds of thousands of living and dead family tree members could allow researchers to firm up the role of genetics in diseases and traits such as height, as the Icelandic company deCODE Genetics has done by combining DNA and health data with the country’s extensive genealogy, Erlich says. In this way, Visscher agrees, the 13-million-person family tree could be a “formidable data resource to tackle nature-nurture questions.”
*Correction, 14 March, 10:12 a.m.: The article has been clarified to indicate that although there are ancestry sites with larger numbers of connected profiles, the 13 million family tree is the largest validated pedigree published.