This Special Advertisising Section is brought to you by AAAS OPMS

Technologies in Genomic Research
The effort to sequence the human genome has both relied on and stimulated new technologies, from rapid sequencing techniques to methods of handling huge amounts of data. Here, we examine the current achievements and near-term promise of the project.

by Peter Gwynne and Guy Page

American Physiological Society


Genetics Computer Group (GCG)

Genetix Ltd


LI-COR, Inc.

Mergen Ltd.

MWG Biotech AG

OriGene Technologies, Inc.


Pyrosequencing AB

Roche Molecular Biochemicals



A few years ago, the task of completing the entire DNA sequence of the human genome seemed monumental. Even optimists assumed that the work would stretch out far into the new millennium. Now, in the very first month of that millennium, the picture has changed dramatically. We can count the months until the complete book of Genetica humana will be ready to read. And we are beginning to see clearly some of the benefits that will stem from that extraordinary piece of literature. "This is a story that's not over," says Francis Collins, director of the National Human Genome Research Institute in Bethesda, Maryland. "Very exciting things lie ahead."

Exciting advances have already emerged. Less than two months ago, for example, participants in the international Human Genome Project (HGP) announced that they had completed a chapter of the book by mapping virtually an entire human chromosome for the first time. A week before that, the National Academy of Sciences held a celebration to mark the completion of one billion sequenced base pairs of DNA and their deposition into an international gene bank. That achievement represents about one-third of the entire complement of the human genome. On the very same day, an international collaborative group, the SNP Consortium Ltd., released about 2,300 newly identified and characterized single nucleotide polymorphisms (SNPs) into the public domain. These minute genetic variations in human DNA provide signposts on the human genome that will help scientists identify genes associated with diseases. In coming months, the international genome community expects advances such as these to occur at increasingly rapid rates.

The remarkably rapid progress of the human genome venture has pushed what began as philosophy into the realm of practicality. Many new issues have arisen and new scientific sub-disciplines have been credited as a direct result of the Human Genome Project and similar scientific ventures. Research teams have already started to test gene therapies based on the new understanding made possible by the HGP. Indeed, the extraordinary progress of human genome sequencing has opened up possibilities that were inconceivable when the project started just ten years ago.

At the time, sequencing a single genome was a truly daunting project. Now, life scientists have begun to focus on variations among different groups of individuals and different individuals themselves. These variances represent both the most fascinating and, because they can possibly help medical professionals to target treatments in individuals and groups of people, potentially one of the most practical outcomes of the genome project. But they are far from the only form of new technology and medical benefit likely to emerge from the work of the past decade. "In the longer term, one hopes to see the sequencing of single molecules become a possibility," says Collins. "Suddenly it's feasible to do immediately what we had thought about doing only five to ten years from now," adds John Donelson of the University of Iowa, who is participating in sequencing the genome of African trypanosome parasites.

In this report, we take stock of where the Human Genome Project is now, how it has reached its present state, and the immediate issues it has raised. In a further report, we will peer into the crystal ball to predict where the results of the project will lead the scientific and medical communities.

The Basic Technologies

Scientists in the HGP consortium, led by groups at the Sanger Centre near Cambridge, UK, and Washington University, St. Louis, use a painstaking approach to sequencing the human genome. They start with a mixture of sperm and blood cells from several volunteers. First, they divide the material into the 23 pairs of chromosomes that contain all human genes. Then, the teams remove sections of DNA from individual chromosomes, clone them using routine techniques, and separate the clones into individual matching pairs of DNA bases using gel electrophoresis, a process that divides molecules into groups of the same size by forcing them through pores in a thin layer of gel. By identifying each DNA base pair and matching them to the pairs on either side of them in the double helix, scientists gradually build up a sequence for gene segments, whole genes, complete chromosomes and, finally, the entire human genome. The approach moves step by step, carefully linking together the pieces of each chromosome genome, like assembling a jigsaw puzzle by starting at one corner and carefully moving out from that area.

Strength in Numbers

The remarkably rapid advance of the Human Genome Project has sparked progress in efforts to sequence the genomes of several other creatures. A typical example is the sequencing of the genome of the African trypanosome, a protozoan parasite that causes African sleeping sickness.

"The fact that the human genome work showed the feasibility of sequencing megabase stretches of DNA made us realize that the African trypanasome genome was well within the range of today's technology," says John Donelson of the University of Iowa, who coordinates an annual World Health Organization meeting on the work. "The major step over the past five years has been realizing that having the brute force of many sequencing machines working 24 hours a day can do the job."

The trypanosome project has borrowed from other sequencing efforts. "An advance in the yeast field has impacted on us: the demonstration that it's feasible to knock out systematically every gene of the organism once you know the genome," says Donelson. "It gives us a framework in which to try to identify the virulence genes in these parasites."

What's the message from that experience? "The bottom line for me is that you can't work on your organism unless you take account of all the other genome information that's out there," says Donelson, "You need to keep in touch with what's going on in all the genome work."

A dramatic new approach emerged 20 months ago. J. Craig Venter, chief scientist and president of Celera - a biotechnology firm based in Rockville, Maryland and owned by PE Corporation - and his former colleagues at The Institute for Genomic Research (TIGR) in Rockville, Maryland started to use a short cut that, Venter claims, will help to sequence the human genome much faster and more cheaply than the HGP consortium. Instead of studying small segments of chromosomal material, Celera's shotgun approach shreds an organism's entire complement of genetic material into tiny pieces. After analyzing the pieces, scientists use powerful computers to pull together all the pieces of information into a recognizable whole. In effect, this approach aims to solve the jigsaw puzzle by throwing all the pieces on the table and figuring out where all should go simultaneously.

The method has progressed significantly since 1995, when TIGR scientists first used it to sequence the influenza genome. "With the first genome, calculations took on the order of 5 to 11 days," recalls Venter. "Now, with the help of Compaq's new Alpha chips and Compaq computers, we've reduced the calculation to under five minutes." A pilot project in which Celera scientists sequenced the genome of the fruit fly Drosophila melanogaster provided some validation of the shotgun method.

In some ways, the approaches are complementary. Celera's technology can identify a rather fuzzy forest of DNA base pairs on the human genome, while the more precise HGP methods can focus in on individual trees and their locations in the forest. The two groups have recently started to talk about collaborating. "Discussions have been going on for a while, and they are quite serious," says Collins. One potential roadblock concerns different attitudes toward making the results of research available to the scientific community. The HGP, as a publicly supported consortium, has a mandate to submit all its DNA data to a public data bank within 24 hours. Celera, a for-profit company that has benefited from the HGP's publicly available data, wants to hold on to potentially patentable information for longer periods. "The challenge is to come up with a model that recognizes the need to have all the data in the public domain," says Collins.

Iterative Advances

Taking the broad view, one of the most surprising facts about the entire effort to sequence the human genome is that it has accelerated far faster than its originators expected, despite the lack of breakthrough technologies. University of Wisconsin geneticist Fred Blattner, a pioneer in sequencing E. coli bacteria, likens the similarities between the sequencing technologies of 1990 and today to those between an automobile of the 1920s and a 2000 model car: The fundamentals scarcely differ.

"Very little has come up to beat the electrophoretic technique invented by (British biochemist and Nobel Laureate) Fred Sanger. The current capillary electrophoresis devices aren't very different from Sanger's gel-based method," Blattner points out. "When scientists started on some relatively small genomes, they invented random shotgun sequencing. That's the same now. Computer involvement has been extremely critical, but it was recognized in the first pieces of sequencing work. And the software functions involved in sequencing are similar. What has happened between the early days and now has been scaling up of the sequencing as more computing power has become available."

Rick Wilson, who heads the Genome Center at the Washington University, St. Louis, agrees. "It's been tough to rely on expected innovations, other than to say that we expect to have a ten percent increase in efficiency every year," he says. "Back in the late 1980s, many people said the human genome couldn't be sequenced without a major technological breakthrough. We weren't content to wait for that breakthrough. We relied on incremental technological advances. Every year we've managed to figure out a way to get more out of our sequencing machines."

Those ways have involved creative advances in technology, if not significant breakthroughs. Capillary-based instrumentation for sequencing, developed first by Amersham Pharmacia Biotech and then by PE Biosystems, for example, has given scientists who undertake gene sequencing the ability to collect as many as one million base pairs of raw data per day. "Capillary sequencing will have a major impact on the project's speed, because of labor considerations," says Wilson. The technology will continue to improve. "We're creating the next generation of DNA sequencers, which will increase throughput and decrease cost, each by an order of magnitude," says David Barker, vice president and chief science adviser for Amersham Pharmacia Biotech. "We're using microfabricated channels in a glass chip that allow sequencing of 500 bases in 15 minutes."

Another key advance has occurred in computing capability. "Life scientists have really underestimated the role of high-end computing in genomics," says Venter. "In sequencing the genome of the fruit fly, we used every bit of the capacity of the Celera supercomputer. Sequencing the human genome will be even more of a challenge."

Scientists in both the HGP and Celera feel confident that they can meet the challenge. After all, the public record shows that researchers have already sequenced about 30 genomes, of varying levels of complexity. Several scientists expect the total to reach 100 by the end of this year. Organisms already sequenced include E. coli, with roughly 4,500 protein-encoding genes, and yeast, with 6,000, sequenced by Ronald Davis of Stanford and a consortium of scientists in Britain and continental Europe. (In comparison, plants have roughly 20,000 protein-encoding genes, and humans have 100,000.) As a result of international archiving efforts, researchers around the world can regularly scan databases of DNA or protein sequencing.

Beyond DNA Sequencing

As iterative advances in sequencing technology have accumulated, they have started to change the goals of the HGP. The concept of "finishing the genome," for example, raises as many questions as it answers. Participants in the effort widely accept the fact that some chromosomal regions of the human genome will remain impossible to characterize; they are too repetitive and unstable to be unraveled by any sequencing technology. Fortunately, those regions seem to have relatively little significance in the genome's overall function.

A deeper issue involves the idea of the genome itself. As originally conceived, the HGP had the goal of sequencing a single genome representative of all humanity. Now, gene sequencers realize that much of the potential and excitement of the project lies in identifying differences between the genomes of groups of people and, eventually, between individual humans.

In that context, SNPs have come to play an ever more central role in human genetic analysis. Scientists do not know whether these single-base variations in the DNA sequence, which are scattered throughout the genome, directly affect gene function. But they can use the variations as extremely effective tools for characterizing individuals or groups quickly and inexpensively.

Each human has a unique constellation of variations in our DNA base sequences. By using the polymerase chain reaction (PCR) in combination with other newly-minted techniques, scientists will be able to create a genetic pattern that uniquely identifies individuals or groups of individuals. "There's a great need to detect variations and to understand how they work in disease pathways," says Collin D'Silva, CEO of Transgenomic Inc.

In addition to defining such pathways, genetically defined characterizations have great potential for associating specific patterns defined by SNPs with manifest characteristics, such as responses to drug treatments. Thus, pharmaceutical companies could use SNP data to develop particularly effective treatments for small groups of individuals or to identify - and warn - individuals likely to suffer severe side effects from specific drugs.

The Reason Y

"I've been obsessed with the Y chromosome, and more generally the sex chromosomes and how you make males and females," says David Page of the Massachusetts Institute of Technology's Whitehead Institute. "As an aficionado of the Y chromosome, I have a special place in my heart for the Human Genome Project."

Page hopes that the completed human genome sequence will reveal how and why the sex chromosomes evolved and acquired their special niches. His group has discovered that the X and Y sex chromosomes - males have one of each, while females have two Xs - were originally a pair of identical chromosomes. "I'm looking to a day when we will be able to speak generally about what distinguishes the sex chromosomes from ordinary chromosomes," he says.

That understanding won't come from the human genome alone. "When we first see the genome of a bird, the sex chromosomes will be one of the first items to look at," Page continues, "for in birds it is the female, not the male, who has the odd chromosome. This grand experiment of Nature - turning an ordinary pair of chromosomes into sex chromosomes - has played out independently several times." Other genomes of interest to Page: those of crocodiles, turtles, and tortoises, which have no sex chromosomes. Their sexes depend on the temperatures at which their eggs incubate.

Ultimately, the complete human genome may answer what Page calls "the more fundamental question: Why bother to have sex in the first place?" He is confident that this line of study "will engage the interest of a lot of biologists who might think that the genome project is a bit arcane."

The technology could also expand the pharmaceutical armamentarium. "Many drugs have not been certified by the United States Food & Drug Administration, because they have an adverse effect on some part of the population," explains Douglas Gjerde, Transgenomic's chief science officer. In principle, organizers of clinical trials could use SNP data to identify those segments of the population. Doing so could theoretically enable the FDA to approve specific drugs on the condition that they not be prescribed for individuals found to be vulnerable to the drugs on the basis of their genomes.

Not surprisingly, corporate laboratories have started to develop new means of tracing SNPs. Transgenomic, for example, has developed a chromatography-based technology that, the company says, detects SNPs rapidly and cheaply. "It's a DNA difference engine," explains principal scientist Paul Taylor. "It will tell you if your sample differs from a reference sample." The technology has an installed base of more than 200 units. Researchers are using it to detect individual genetic variations in different genes, such as those related to breast cancer, lung cancer, cystic fibrosis, multiple sclerosis, Marfan syndrome, and many other diseases.

Swedish company Pyrosequencing AB, meanwhile, is about to launch a sequencing technology that will initially target SNP analysis. It uses an enzyme cascade system to generate light as nucleotides are incorporated onto a single-stranded DNA containing the SNP. "Once you've sequenced the genomes, you need to use the information," says Helena Nilshans, Pyrosequencing's product manager. "Pyrosequencing helps you to benefit from all the markers that scientists are investigating." Beta tests have shown that the method is greater than 99 percent accurate. That's important, for example, in marker validation, continues Nilshans, "because a large error rate means that you'll have to increase the number of analyses to get a statistically significant result."

Mining the Data

As the example of SNPs illustrates, the key post-sequencing issue involves the use of the data. "We're in a time very much like the Renaissance, going from almost no information a few years ago to almost complete information a few years from now," points out David Eisenberg of the University of California Los Angeles (UCLA). "But information is not knowledge. It must be synthesized into knowledge."

Making the jump from raw sequencing data to a functional understanding of the way life works involves more than a few inspired ideas and back-of-the-envelope calculations. Because of the vast mountains of genetic data assembled by genome sequencers, the issue has taken on a needle-in-the-haystack coloration. Fortunately, information technologists have spent several years grappling with the problems of assembling data banks and distilling meaning from the huge amounts of data these repositories contain.

Data mining programs use a variety of advanced technologies, including artificial intelligence and neural networks, to identify significant clusters of data from among the info-rubble. Several companies have developed tools that help researchers interrogate genetic data, and many biopharmaceutical programs have emerged that systematically dig for genetic gold. "We're at the intersection of the two most important fields in the world right now: the structuring and dissemination of information," says John Devereux, chief scientific officer for the Oxford Molecular Group's bioinformatics effort. "The domain we're trying to structure is modern biology."

In the United States, the National Library of Medicine acts as the repository of publicly available genome sequencing data. Oxford's genetic computer group republishes much of that data in a user-friendly format. "Our mission is to make available the data and the tools to look at it in readily accessible form," explains Devereux. "We also provide a registry for our customers' proprietary sequences. We try to support people who are doing clinical work."

Incyte Pharmaceuticals takes a similar approach. "We've always viewed our mission in life as taking all the information in the public domain and adding value to it," says Randal Scott, Incyte's president and chief scientific officer. Thus, in 1991 the company began its effort to discover new genes from white blood cells. "Gradually," Scott recalls, "we realized that we could expand that activity and conduct a comprehensive analysis of the human body. We could then get a partial sequence for every gene in the human genome. Having large databases means that we at Incyte have spent a large amount of time developing new algorithms for data mining in order to discover novel genes."

Understand How Proteins Work

Eisenberg points to the basic premise of data mining. "The information from the HGP is just a sequence of letters or a series of colored dots in an array," he states. "What does this mean in terms of how organisms work? That understanding will only come as we learn the functions of proteins." Just under a year ago, Eisenberg and his colleague Todd Yeates set up a company, Protein Pathways, to obtain some of that understanding. The company aims to exploit new computational technologies that they had developed at UCLA. The first approach has been dubbed 'phylogenetic profiling'. "Proteins that operate together as part of a complex, or a shared metabolic pathway would be expected to be inherited together in whatever genomes contain them, or to be simultaneously absent in organisms that do not require them," explains Yeates. "So you can take any particular protein and look for its presence or absence in any of the completely sequenced genomes. That pattern gives you a descriptor for that protein. If you see two proteins with similar patterns, chances are that they have a relationship in the cell. Often, you have a protein that's uncharacterized. As that protein evolves with a set of other, characterized, proteins, we can infer the function of the unknown proteins."

Last summer, Edward Marcotte and the Protein Pathways team reported another way of using genetic data to understand the functions of proteins. The "Rosetta Stone method" stems from the observation that two functionally related proteins in one organism often can be found fused together into a single protein chain in another organism. Given a large number of genomes, scientist can find a large number of such evolutionary fusion events. "If Protein A is uncharacterized but is fused to Protein B in another organism, there's generally some sort of relationship," says Yeates.

Rethinking Biology

The Human Genome Project has directly or indirectly stimulated several technological developments. Realizing the extraordinary value of accessing every human gene and variation, pharmaceutical companies have worked hard to develop HGP technology to the benefit of their drug discovery programs. The explosion in potential drug targets, for example, has done much to stimulate the growth of high-throughput screening technology.

The impact of the HGP on technology actually goes one step further. The project has a fundamentally inclusive goal. It seeks all, rather than just some, of the human genome sequence. This aggressive, comprehensive approach has set a standard for biological research that has been extended to the examination of gene function (also known as functional genomics), protein structure (proteomics), and even cellular structure (cellomics). Once the life science community embraced the idea of understanding human genetic structure in all its multiplicity, it began to carry that concept into other areas. The long-terms effects of this conceptual approach will undoubtedly transform biology. "Now is a great time for both academia and industry to be doing research into the molecular basis of life and disease," says Incyte's Scott.

The Impact on Society

The Human Genome Project had the original goal of producing both a map and a detailed sequence of the entire human genomic structure. In one sense, the project could be - and was - viewed as the practical development of a research tool. It was an exalted, difficult, and demanding project that had a specific goal, not unlike the original moon landing.

During its early years, the project did focus on the technical and logistical challenges of reaching that ultimate objective. Research teams devised mapping strategies; evaluated available data-handling capacities and tools; developed new capabilities as necessary; invented and improved instruments for sequencing DNA; and even gave some consideration to laboratory management practices. In recent years, however, the HGP has become diffuse. Unlike the moon-landing program, the project's nature and goals have evolved as the effort has progressed. At the same time, its impact on human society has broadened dramatically, if quietly.

On one rather narrow technical axis, the genome community has changed its focus from the human genome to genomes in general. Scientists have extended their reach and their technologies to E. coli, yeast, nematodes, Drosophila, mice, corn, Arabidopsis, and countless microbial species. The principle here is simple: What is good for people is good for every other organism. If we are to find an underlying truth in base pairs of DNA, then we should certainly look in the base pairs of every important organism.

On another axis, life scientists have evolved from viewing the sequencing of a genome as a huge challenge to regarding it as a routine research practice. They saw the power of the new technology, and they believed. Life scientists have placed great faith in their abilities to generate, analyze, assemble, store, and disseminate billions of bases of genetic information for researchers around the world. They are firm in their belief that this information will provide fundamental biological insights into human beings and the model organisms whose genomes are sequenced. In fact, very few areas of modern biology, from biochemistry to epidemiology, can or should ignore the opportunity to include genetic information in their definition. Genetics has become ubiquitous, and it will be used ubiquitously.

Also deriving directly from the focus on genomics is a progressive conversion in our thinking to more integrative models and methods. The subdivision principle of divide-and-understand has long stood tall at the center of biological research. Genes, proteins, and cellular components have all been dissected from their whole environment for intensive individual study. To classical biologists, this process destroys the very substance of life, which lies in the integration of all the parts into a living, functional whole.

Because life scientists have had to pick their way through the vast complexity of biology, the reductionist approach was the only feasible one. But as they have come to command greater and greater proportions of human genetic material, gene sequencers have started to think in broader, more integrated ways. Where scientists once analyzed the biological expression of a single gene, or, at most, a couple of handfuls of genes, they can now analyze hundreds, thousands, and tens of thousands of genes at the same time. In principle - although not yet in practice - we could take a snapshot of the activity of every single human gene. Doing so would give us a total and complete picture of human genetics in action.

An important feature of the "-omics" approach is that it permits scientists to study populations without sacrificing the ability to analyze any individual component. Comprehensive new models make it possible to integrate genetic analysis from the single gene to the entire population simultaneously, and to link all the parts simultaneously. Similar integration is in sight for proteins. While the paradigms remain to be developed, the prospects are truly staggering.

And this is only the beginning. "The question of the genome opened up when I started in graduate school," recalls Fred Blattner. "Now, we're getting to the end of the discovery phase and into production - and perhaps into more discovery. It's a little bit like being a physicist a century ago."

Peter Gwynne is a freelance science writer based on Cape Cod, Massachusetts.
Guy Page is managing director of Ferguson Forth Page, a consulting firm in Madison, Wisconsin.

Genome Technologies


American Physiological Society


Genetics Computer Group (GCG)

Genetix Ltd


LI-COR, Inc.

Mergen Ltd.

MWG Biotech AG

OriGene Technologies, Inc.


Pyrosequencing AB

Roche Molecular Biochemicals



Note: Readers can find out more about the companies listed by accessing their sites on the World Wide Web (WWW). If the listed company does not have a site on the WWW, or if it is under construction, we have substituted that company's main phone number. Every effort has been made to ensure the accuracy of this information.

Genome Technologies

Amersham Pharmacia Biotech AB

Beckman Coulter, Inc

Bio-Rad Laboratories

Celera Genomics, Inc

Genome Sequencing Center, Washington University

Incyte Pharmaceuticals, Inc

Massachusetts Institute of Technology

National Human Genome Institute

Oxford Molecular Group, PLC

Perkin Elmer Biosystems

Protein Pathways, Inc

Pyrosequencing AB


University of Iowa

University of Wisconsin

The companies in this article were selected at random. Their inclusion in this article does not indicate endorsement by either AAAS or Science, nor is it meant to imply that their products or services are superior to those of other companies.

This article was published
as a special advertising supplement
in the 28 January issue of Science