Biomedical research is often slow and incremental, but it can take a leap when someone uncovers a hidden connection. For example, researchers might never have tested a hunch that fish oil eases symptoms of Raynaud syndrome, a circulatory disorder, if an information scientist hadn’t taken the time to painstakingly scour stacks of technical articles on the seemingly unrelated topics.
It’s likely that other game-changing links lurk elsewhere in the biomedical literature. But with new papers getting published every 30 seconds, scientists are hard-pressed to find those needle-in-haystack connections. Today, one group of researchers is launching a crowdsourcing initiative to pave the way, by harnessing the efforts of lay volunteers who will scan papers for key terms to help create a powerful searchable database.
This crowdsourcing curation campaign, dubbed Mark2Cure, is first reaching out to a particularly motivated crowd—the community of people affected by NGLY1 deficiency, a newly discovered genetic disorder. Researchers have diagnosed the disease—which is caused by defects in NGLY1, an enzyme that removes sugar molecules from proteins to ensure proper degradation—in about 35 people worldwide, but they believe some 1500 others may have it. The disorder has a bewildering array of symptoms that include liver problems, poor reflexes, an inability to produce tears, and sometimes seizures.
In 2012, exome sequencing confirmed the world’s first NGLY1 patient—Bertrand Might, a 7-year-old boy in Salt Lake City. Last fall, Bertrand’s parents, Cristina and Matt Might, learned through a conference tweet that Andrew Su, a bioinformaticist at Scripps Research Institute in San Diego, California, was seeking lay volunteers for a curation project. Su and colleagues had previously shown that novices can curate reliably, earning about 7 cents per abstract. In that study, the researchers found curators through Mechanical Turk, a Web platform for harnessing human intelligence for things computers can’t do well. “The next step was to reduce that [cost] to zero, to see if we could get volunteers to help us,” Su says.
The Mights were eager to contribute and got the NGLY1 community on board. “It was very clear they were interested and willing to attack this problem from different angles,” Su says. As a computer scientist at the University of Utah in Salt Lake City, Matt Might shares Su’s passion for managing and sharing data. The Mights have also seen the power of data sharing in a personal way. Matt’s essay about Bertrand’s 4-year journey to diagnosis turned out to be instrumental in identifying new NGLY1 patients.
The National Institutes of Health (NIH) already spends millions of dollars hiring professional curators to do this sort of work. Now, the Scripps team aims to engage laypeople who are able and willing to do the same job—in small chunks at a time using their own computers—for free. Building the knowledge base requires humans to teach computers key concepts from curated articles; with modest online training, anyone who reads English can scan research papers for key terms—names of genes, proteins, diseases, and drugs—and use online marking tools to document relationships between them (for example, drug X treats disease Y). In an experiment completed a few months ago, the Scripps researchers found that although the average novice doesn’t curate as well as a person with a doctorate, groups of novices actually perform on par, or even slightly better, than a professional.
Mark2Cure’s current project “is all about showing that the output of [volunteer] efforts is scientifically meaningful,” Su says. To ensure the results are applicable to research on NGLY1, the Mights helped connect the Scripps team with glycosylation expert Hudson Freeze. Freeze runs a lab at Sanford-Burnham Medical Research Institute, a mere 180 meters from Scripps. He and co-workers did some of the experiments that helped confirm that NGLY1 mutations caused Bertrand’s disease.
Since then, Freeze’s lab has created cell lines using samples collected from 10 NGLY1 patients, and he hopes the crowdsourcing campaign will yield testable new hypotheses. For example, the curation effort could uncover information that suggests specific biomarkers or readouts researchers could test in the lab to gain insight into disease mechanisms—or, in the best case, lead to a cure for NGLY1 deficiency.
Although today is Mark2Cure’s official launch, a small group of curators has already taken a first look at more than 100 abstracts. To achieve reasonable accuracy, each abstract needs to be annotated by at least 15 different lay curators, says Ginger Tsueng, Mark2Cure’s scientific outreach project manager. The team is hoping the citizen scientists will eventually curate thousands of articles selected for their potential relevance to NGLY1 deficiency. Participants rack up points for their efforts, up to 1000 per abstract, depending on how well they and other participants’ markings match. There is no “leaderboard” or prize, Su says. Citizen scientists seem plenty motivated “to contribute to something bigger than them.”
Eileen Estenik, a part-time secretary in Mexico, Missouri, is one of the volunteer curators. She learned of Mark2Cure through the Mights after her 2-year-old son Benjamin was diagnosed with NGLY1 deficiency in November 2014. Before that, the boy was treated for seizures and had two rounds of chemotherapy for a liver tumor that eventually warranted a liver transplant last summer. “We’ve gone through a lot,” Estenik says. She got to work curating during Mark2Cure’s soft launch and so far has racked up more than 50,000 points. “We want our adorable little boy and all his buddies to have the best life we can give them,” Estenik says. “And if that means we have to fight through all those documents to help our doctors, so be it.”
Scientists also think Mark2Cure could pay off. “At this time people can find useful information [within publications] much better than machines,” says Mike Cherry, a biologist at Stanford University in Palo Alto, California, who is involved with ClinGen, an NIH-funded curation project to build a database of genomic variants in precision medicine and research. “[Su] has devised an interesting and intriguing approach to triage the literature. I am sure the results will be very interesting.”