Sequencing the human genome in the 1990s was supposed to reveal the entire universe of genes important to health and disease. But a handful of recent studies have shown that, surprisingly, researchers still focus mainly on only about 2000 of the roughly 19,000 human genes that code for proteins.
Thomas Stoeger, a systems biologist at Northwestern University in Evanston, Illinois, wondered why. Now, after conducting a massive bioinformatics analysis reported today in PLOS Biology, he thinks he knows. Some of the reasons are obvious; others, less so.
Stoeger, Luís Amaral, and colleagues scoured several dozen databases and other resources to compile 430 features of more than 12,000 genes, such as when a gene was first discovered and the chemical and physical properties of its protein. Machine learning algorithms then worked through those data to find correlations with measures of popularity, such as number of publications on a gene and National Institutes of Health funding devoted to it. Unexpectedly, the analysis found that a combination of just 15 gene traits can largely predict how popular a gene has been and whether study of it has led to a medical drug.
Science talked to Stoeger about why such DNA favoritism matters and how biologists can force themselves to unearth hidden genome gems. This interview has been edited for clarity and length.
Q: So why do researchers have a bias for certain genes?
A: Genes that express more protein get more attention because they’re easier to study—there is more material to put through an assay. Similarly, it’s easier to study genes expressed in a number of tissues in the body, versus in just one or two places. And genes that have a big impact when they’re mutated or disabled in cells or mice are also attractive to scientists because they are more likely have big impacts in the body.
Q: Are there incentives for researchers to only study more popular genes?
A: Ph.D. students and postdocs who work on less studied genes have a 50% lower chance of becoming a group leader because it’s harder for them to get funding. So they kind of get kicked out somehow.
Q: You found that a lot of genes scientists are ignoring could be medically important. How?
A: We asked whether genes with strong evidence for a link to a disease—in large groups of people with that disease—compared to people without it are studied more than other genes. We do indeed find this trend, so that’s good. But we also find a lot of genes linked to disease that are not well studied. This shows that there is the potential for finding new drugs and treatments to help people by understanding the biology surrounding some of those ignored genes.
Q: How can you get researchers to do that?
A: We don’t know for sure. I think part of the solution will be for funding organizations to dedicate part of research efforts to exploring some of those less characterized genes. Presently we are not supporting researchers to do that at all. There would also be a need for adjustments to how these researchers are supported. Maybe they need more time. Maybe they need to create new tools because the genes they’re looking at are harder to study.