Scientists vying for a $1 million prize in DNA analysis from the Department of Defense are crying foul. Several of the top 10 contenders charge that the rules and scoring of the Defense Threat Reduction Agency's (DTRA's) Algorithm Challenge to detect bioterror threats in samples of DNA were unclear. When the submission period closed on 14 July, three out of 103 "solvers"—including both individuals and teams—had made the final cut and will now have their computer programs scored in the final evaluation round.
"The way they organized the competition and the way they scored it was just horrible," says David Ainsworth, a bioinformatics Ph.D. student at Imperial College London, whose team is still in the running for the prize.
Organizers admit that the contest was tough and say that figuring out where to set the bar was tricky. "You don't want to make it so high that nobody can win, but you don't want to set the bar so low that you have 200 people tied for first place," says Christian Whitchurch, the challenge's project manager. The fact that three teams succeeded makes him optimistic that the final product will be of great use to the government.
But several participants say that the scoring system was needlessly complicated and didn't reflect the quality of their work. "In terms of the goal of the contest as I understood it, I think it's unlikely that the best algorithm will win," says bioinformaticist Steven Salzberg of Johns Hopkins University in Baltimore, Maryland. He did not participate, but saw two scientists in his lab—a Ph.D. student and a bioinformatics engineer—narrowly miss the cut.
In January, DTRA, a Pentagon agency charged with a wide range of security tasks, including trying to spot and avert a possible bioterror threat, laid out the challenge: Find a radically faster and more accurate way to identify the species and genes in raw DNA. As DNA sequencing becomes faster and cheaper, DTRA hopes that the challenge will inspire a highly accurate program to detect potentially dangerous organisms in DNA samples in under an hour—a vast improvement over today's capabilities.
To manage the contest, DTRA chose InnoCentive, a company that hosts online challenges on behalf of "seekers" and accepts submissions from potential "solvers" around the world. This is the largest prize that DTRA has offered to date, and it represents a departure from the agency's normal way of conducting business. By accepting entries from anyone who registered online, DTRA hoped to tap the knowledge of a diverse group of scientists without the red tape of the bidding and contracting process, Whitchurch says.
For participants, competing requires a high-risk investment of time up front, but it also eliminates the traditional barriers of the grant process. "Having the skill to actually solve something and having the skill to write a proposal to get the money—they're two different skill sets," says Daniel Huson, a bioinformaticist at the University of Tübingen in Germany, whose team is among the qualifiers. Enticed by the million-dollar prize, roughly 2700 people signed up for the challenge, about half of them from the United States, and 103 of these submitted work. Huson and his teammates dropped everything to devote 17-hour days to the project, but he says that the atmosphere was exhilarating and the goal worthwhile.
The entrants received nine data sets, each containing a mix of genetic code from unknown sources; their job was to design a bioinformatics program that would identify the organisms in the mix and describe their individual genes. They submitted results to an automated scoring system, which sent back an accuracy reading between zero and 100. To qualify for the evaluation round, they had to earn a minimum score for each of nine data sets.
From the outset, says participant Derrick Wood, it was unclear how specific his program's answers should be and how they were being scored. Wood, a computer science Ph.D. student at the University of Maryland, College Park, who did not qualify for evaluation, says that he lost points for not naming the strain of an organism in some cases, but received no extra points when he added strain information in others. After asking for clarification on which genetic database he should use as a reference, contestant Robert Edgar, a self-funded computational biologist, says he was told in DTRA's e-mail newsletter that a database called RefSeq was fair game, but later found that using it left out crucial information and lowered his score. Edgar, who also just missed the cut, says that DTRA was vague in addressing questions via periodic e-newsletters.
To check how well their algorithm was working, participants could repeatedly submit their results for scoring, but there was a monthly limit on submissions, and several contestants reported that the scoring algorithm was fraught with bugs. Edgar recalls that the system was taken offline and then reappeared with entirely different scores on the leaderboard. Relying on this mysterious numerical score turned the challenge into an exercise in reverse engineering, Ainsworth says. "We spent the last month trying to get the [scoring] algorithm to tell us that we've done well, instead of actually doing the proper science to produce a good result," he says.
Because no one qualified during the first 4 months of the challenge, DTRA relaxed some of the rules. It pushed back the original 31 May deadline to 30 June, removed two required data sets, upped the number of times entrants could submit results for scoring, and lowered the accuracy requirements. As contestants continued to struggle, the deadline was again bumped to 14 July. "I admit that I was nervous when we only had a month to go and no one had met the threshold," Whitchurch says.
With less than a week before the deadline, participants received an e-mail encouraging them to team up to improve their scores. There was "a flurry" of desperate e-mails among teams in the last 24 hours, says Ainsworth, who got five such invitations and partnered with another solo competitor in the nick of time. A newly formed team could use outputs from different algorithms on different data sets, swapping results to piece together a passing score. "It's just a completely artificial way to game the system," says Edgar, who believes the algorithms that qualified may be no better than the rejected ones.
The three qualifiers have now uploaded all of their code for further testing in the evaluation round. For the first time, speed will play a role in the competition, as programs go head to head on both the original data sets and a brand-new DNA sample. There remains a possibility that if no team meets the requirements, some of those who had been disqualified will be considered, Whitchurch says.
Even the qualifiers can't be sure their work meets the original requirements. Ainsworth and his teammate tweaked one of their algorithms but have no idea if it can handle the original nine data sets. He'll find out in September, when the results of the evaluation phase are announced. "I'm just treating it as a lottery now," he says.