HINXTON, U.K.--When two rival groups published their versions of the human genome in February, each challenged the quality of each other's results. Now, the first formal comparison of the public and private genome maps confirms that there are indeed major differences--and suggests that the less extensive version, produced by Celera Genomics of Rockville, Maryland, is more accurate.
At the Genome Informatics Conference here on 9 August, Colin Semple, head of bioinformatics at the Medical Research Council's Human Genetics Unit in Edinburgh, described an analysis of a 6.9-megabase stretch of chromosome 4 (4p15.3 to p16.1), a region implicated in bipolar disorder. The region--at least 96% of it--had been sequenced independently by Kathy Evans and her group at the University of Edinburgh using a map-based method.
Semple's team compared the sequences published by Celera Genomics and the publicly funded Human Genome Project (HGP), using the Evans team's data as a yardstick. Contrary to speculation, Celera's approach of breaking the whole genome into random fragments for sequencing yielded better data than the map-directed approach used by HGP. For this swath of DNA, Celera made half as many "misassemblies"--putting a fragment in the wrong order, or flipping it--as the public effort did, logging 2.08 misassemblies per megabase.
However, Semple's team found that the Celera stretch is still full of holes: Celera had sequenced only 23% of the region, while HGP had managed 59%. Semple notes that his group analyzed data that were publicly available as of 1 September 2000, so both sequences undoubtedly have been polished since then. And it's unknown whether the accuracy rates in this chromosome 4 region can be extrapolated to other regions.
Semple's presentation provoked surprisingly little rancor. Indeed, the comparative study's implications depend on one's point of view. "It's a question of whether you want Havarti or Swiss cheese. The public assembly doesn't have that many holes, but the holes it does have are much bigger," says Jim Kent of the University of California, Santa Cruz, author of the computer program used for the initial assembly of the HGP genome sequence. Kent says scientists considering a subscription to Celera's database should first examine the region of interest in the public database: "If it's in good shape, then praise the lord they've just saved themselves $20,000."