Geneticists have deciphered the genetic code of the world's most studied microorganism, a bacterium called Escherichia coli. As reported in tomorrow's issue of Science*, the genome of this laboratory workhorse and common pathogen is 4.6 million base pairs long, making it the largest bacterial genome sequenced to date. This DNA includes 4288 genes, 40% of which are complete mysteries, says Frederick Blattner, whose group at the University of Wisconsin, Madison, began working toward this goal almost 15 years ago.
For decades, microbiologists, biochemists, geneticists, and cell biologists have studied this easy-to-grow organism to gain insights into the biochemical workings of life. More recently, a pathogenic strain in tainted meat has made E. coli a household name. Many researchers hope the complete genetic code of the innocuous strain will help them understand what makes this new strain so deadly.
Already, partial E. coli sequences that Blattner and, independently, a Japanese team have deposited in a public database have given researchers a taste of how valuable such data can be. Matches between new genes from other species, including mammals, and these E. coli sequences have often helped researchers pin a name and function on their discoveries. Now, with the full genomic overview, researchers can begin to form a coherent picture of E. coli's complete biology. "Having this complete set of instructions gets us one step closer to understanding how a free organism functions," points out Francis Collins, director of the National Human Genome Research Institute (NHGRI) in Bethesda, Maryland.
Blattner's team started its long decoding project in 1990, after receiving one of the first grants awarded by the NHGRI's predecessor, the Human Genome Center. First, the researchers broke up the E. coli chromosome into small pieces that could be used to build a map of genetic landmarks along the DNA. They then began sequencing fragments of the genome and used their map's landmarks to properly organize the fragments. By the end of 1994, Blattner's team had completed 1.4 million bases and finished sequencing the remaining 2.6 million by January 1997. Along the way, Blattner and his colleagues searched for new genes by comparing the newly sequenced DNA to genes already in the public record. "We were discovering new things every day," says Blattner.
Now that researchers have E. coli's complete genetic code, they can use the information to find proteins that fill in metabolic pathways or that are hard to identify otherwise. They can also compare E. coli's genetic makeup with that of some of the other microorganisms sequenced so far--bakers' yeast, Saccharomyces cerevisiae; Helicobacter pylori, the bacterium responsible for most cases of intestinal ulcers; and Methanococcus thermoautotrophicum, an organism that thrives in hot springs, for example. (Lists of completed and in-progress genomes are available at: http://www.tigr.org/tdb/mdb/mdb.html or http://www.mcs.anl.gov/home/gaasterl/genomes.html) "There are a lot of experiments that you can do that you couldn't do before," says Richard Roberts, a molecular biologist at New England Biolabs in Boston.