This Special Advertising Feature is brought to you by AAAS OPMS

DOI: 10.1126/science.opms.p1300075

 Chinese Translation  |   For PDF version | New Products

Microbiomics: The Germ Theory of Everything


Fast, cheap DNA sequencing technology now allows scientists to study unculturable microbes; the results are challenging some of biology's most fundamental notions.

By Alan Dove

Inclusion of companies in this article does not indicate endorsement by either AAAS or Science, nor is it meant to imply that their products or services are superior to those of other companies.

In the 19th century, some biologists began espousing an apparently absurd theory: that diseases were caused not by poor hygiene and foul vapors, as everyone knew they were, but by organisms too small to see with the naked eye. Pioneering researchers working on this strange idea developed sterile culture techniques, improved microscopes, and created other cutting-edge tools. Gradually, their results convinced their colleagues that the new germ theory of disease, odd as it seemed, was true.

In the 21st century, some biologists have begun espousing an even more absurd theory: that humans and other macroorganisms are not individual entities, as everyone knows they are, but complete ecosystems dependent on billions of microbes. Pioneering researchers working on this unusual idea have developed novel sampling strategies, powerful new gene sequencing and data analysis techniques, and other innovative technologies. Gradually, their results are convincing a new generation of scientists that the microbiomic theory of life, odd as it seems, may be true.

"There are more microbial genomes within us than we have human cells. We're a walking ecosystem. That's a pretty profound reality," says Timothy Harkins, director of research and development at Life Technologies in Carlsbad, California.

The Uncultured Majority

Microbiologists have long known that there are many bacteria, fungi, protozoans, and viruses that won't grow in the lab with current culturing techniques. Now the plummeting prices and skyrocketing sensitivity of next generation DNA sequencing technologies are finally letting researchers study this unculturable majority.

Most studies in this emerging field consist of sampling an environment, sequencing as much of the DNA in the sample as possible, and using the sequence information to identify the organisms in it and possibly their ecological functions. The results can be surprising. For example, microbiome analyses of the human gut have revealed that each person's large intestine carries a unique mix of bacterial species, and that perturbations of this intestinal ecosystem may cause severe illness and even starvation.

Though DNA sequencing forms the backbone of microbiomics, even the best sequencing protocols are useless without careful sampling and experimental design. "Sequencing is exciting and it's very interesting, and people have frequently ... said 'let's just sequence everything and sort it out later,'" says Jonathan Eisen, professor of evolution and ecology at the University of California in Davis, California. Eisen argues that this approach glosses over some crucial questions: "Do you want living cells? Do you want dead cells too? It's a pretty coarse tool to just say, 'I'm going to look at DNA.'"

Besides adding confounding factors such as dead cells and host DNA, poor sampling can destroy some of the genomes researchers wanted to find in the first place. "When [an anaerobic] bacterium is exposed to an oxygen environment it goes into apoptosis and kills itself, and shreds its genome, so how do you characterize something like that?" asks Harkins.

Once they've determined how to collect a useful sample, investigators need to decide exactly what questions they intend to ask, and how they want to frame the answers. Fortunately, zoologists and botanists have been studying ways to characterize ecosystems for decades. Unfortunately, they haven't reached much agreement.

To classify the organisms, biologists can take either a taxonomic approach, making a list of the species that are present and sorting them by their characteristics and the niches they occupy, or focus on constructing phylogenetic trees based on evolutionary relationships. Both methods have adherents and detractors. "People have been arguing about this for a hundred years," says Eisen, a member of the phylogenetic camp.

Quantifying the diversity in ecosystems is somewhat more straightforward. Ecologists generally measure three types of diversity: alpha diversity, based on the number of species or phylogenetic groups in a specific area; beta diversity, which compares diversity between different areas; and gamma diversity, which uses alpha and beta to account for the total biodiversity of a large ecosystem. In medical microbiomics, researchers often measure the alpha diversity within a single person's microbial sample, and calculate beta diversity between the microbiomes from different people.

Covering the Bases

After settling on an experimental design, microbiomics researchers move to the sequencing phase, where they face another major choice: sequencing ribosomal RNA (rRNA) or sequencing random snippets of whole genomes.

In rRNA sequencing, investigators use primers designed to amplify only the genes for 16S ribosomal RNA, a molecule that has changed very slowly throughout evolution. The number of different rRNA sequences in a sample is a good proxy for the number of species, and public databases of such sequences can be used to identify many organisms. Shotgun or metagenomic sequencing, in contrast, involves sequencing short, random pieces of all of the genomes in a sample, then trying to piece them together afterward.

Each method has advantages and drawbacks. "There are folks who like the shotgun approach I think because overall it can be easier, [but] the one thing the ribosomal RNA approach offers is [it's effectively] an enrichment technique," says Todd Arnold, head of research and development at 454 Life Sciences, a Roche Company in Branford, Connecticut. Ribosomal RNA sequencing is particularly useful for samples from human microbiomes, as the technique makes it relatively easy to ignore the enormous background of host DNA and focus only on the microbial components. Researchers in the field also regard rRNA sequencing as the more mature technology, with better-defined procedures and clearer equipment choices.

However, metagenomic sequencing can identify a much wider range of variations across entire genomes, and may eventually enable scientists to sequence whole genomes in mixed samples. "I think there are aspects of metagenomic analysis which are becoming more routine, more off-the-shelf, [and] there's a lot more that could be learned from metagenomic data," says David Relman, professor of microbiology and immunology at Stanford University in Stanford, California.

Fortunately, sequencing equipment makers show no signs of resting on their laurels, and several companies are vying to increase their machines' performance for both types of efforts. Though many microbiomics researchers have settled recently on Illumina's HiSeq system for rRNA projects, Eisen is quick to point out that sequencing technology is still changing rapidly. "I don't think it's reached a plateau," he says, adding that genome sequencing for metagenomics studies is particularly ripe for new breakthroughs.

Regardless of the platform they choose, biologists can expect the technology to be relatively user-friendly. "Sequencing is no longer seen as a skill for those who are very, very good at sequencing, it's no longer in core labs," says Arnold. Instead, modern high throughput sequencing machines are highly automated and include software that analyzes the raw data and performs initial quality control checks on it.

As a few early applications of microbial sequencing have reached the heavily regulated world of medicine and drug development, some gear makers have taken the automation a step further. For example, the Applied Biosystems Microseq platform performs traditional Sanger sequencing to identify bacterial and fungal contaminants in pharmaceutical manufacturing facilities. The system streamlines sequencing to detect a handful of specific organisms quickly and accurately, rather than probing the entire microbiome for all of the species present.

Down the Data Mine

Streamlined data and simple answers are the opposite of what basic researchers get from microbiome samples, though. "[Microbiomic] studies involve deep sequencing and are data-intensive—for both data storage and data analysis," says Susan Knowles, senior marketing manager for microbiology at Illumina in San Diego, California. With the field still in its infancy, most of the software for analyzing those data comes from the investigators themselves. "There [is] a range of open source tools that researchers use for [microbiomics]," says Knowles, adding that "most of these data analysis tools require some bioinformatics skills."

For scientists who are new to high throughput sequencing, the sheer quantity of information coming out of a sequencing machine can be a shock. 454's Arnold says that the deluge of data often flummoxes newcomers.

In projects that focus on rRNA, investigators can take advantage of the simplified pool of possible sequences and large databases of known rRNA genes. Because this technique is more established than shotgun microbiome sequencing, the software for analyzing rRNA is also somewhat easier to use. Depending on the information experimenters hope to find, they may be able to complete a simple rRNA project without having to hire—or become—bioinformaticians.

Shotgun sequencing is a different story. "The methods to do that with metagenomics are much more complex," says Eisen. In a metagenomics study, scientists have to identify putative genes from fragmentary sequences, determine what families those genes belong to, and try to identify what organisms they come from. Each step presents unique and serious challenges.

Microbiomic data analysis also raises a question that has vexed biologists for centuries: Exactly what is a species? Botanists and zoologists have reached some tentative definitions, but in bacteria, promiscuous gene swapping and rapid evolution make the concept trickier. For viruses, it may not apply at all. Worse, microbiomics itself could undermine traditional views of species distinctions. If each organism's microbial ecosystem drives crucial parts of its biology, where does one individual end and the next begin?

To avoid being bogged down in philosophy, many microbiome researchers have settled on the idea of operational taxonomic units (OTUs), a practical, gene sequence-based analog of the species concept. Sequences that diverge beyond a certain threshold fall into distinct OTUs.

The European-funded MetaHIT project has recently used another approach: characterizing the human intestinal microbiome based on the putative functions of the different gene families in the sample, rather than by the species carrying those genes. Eisen explains that "they were some of the first people to do beta diversity and alpha diversity of function," just as biologists have done with taxa. In this view, the organisms' contributions to the ecosystem are what matter, not their identities or evolutionary origins.

After deciding how they want to view an ecosystem, researchers need to account for the inherent biases in their data. "There is no unbiased approach. The real challenge and important goal is to understand how biases arise in data, and what [one can] do to either minimize them or control for them," says Relman. As examples, exposing samples to oxygen will eliminate obligate anaerobes, sequencing DNA will ignore RNA-containing viruses, and gentle extraction techniques may fail to lyse durable fungal or bacterial spores.

Going Long

While early pioneers in microbiomics continue sorting out the best ways to ask the scientific questions, engineers and equipment makers are trying to address some of the remaining technical needs. One of the top items on the agenda is longer sequence reads. "The longer the read the better," says Harkin. He adds that "300 bases seems to be a good sweet spot, 300–350, and then the next sweet spot is when you get up to say 600, 800 base pairs." Those lengths allow researchers to map microbial diversity at distinct levels of resolution, with greater lengths providing finer-grained separations between species.

Longer rRNA sequence reads allow researchers to distinguish organisms more clearly on a phylogenetic tree or taxonomic list, and longer metagenomic reads make it easier to assemble larger portions of each organism's genome. Companies are also trying to make their sequencing systems more efficient at handling multiple samples, which is especially important for large clinical studies that try to identify variations in microbiomes across a human population.

Scientists are also trying to define clear methods and controls to ensure reproducible results. That's turned out to be a thorny problem. "People have not tested a lot of the methods that are being used, they push a button and they run them, and we do the same thing," says Eisen. Highly automated sequencing systems make it easy to produce results, but without clear guidelines for data analysis it's unclear what those results mean. To try to establish a reference point, Eisen and his colleagues recently created a completely artificial microbiome by mixing known bacterial species that would never encounter each other in nature. "We shotgun sequenced them with different methods, and we tested how much we could actually figure out about a system [for which] we knew the answer. Even in that relatively simple artificial system some parts were very hard," he says.

Classical microbiology may be able to help. Relman says that the new sequencing technologies, plus ongoing efforts to characterize bacterial environments in more detail, are enabling investigators to determine the culturing requirements for previously unculturable organisms. Growing these microbes in the lab makes it much easier to study them.

More exploratory surveys of microbiomes should also clarify some of the field's boundaries, especially in medicine. For example, an ongoing project to study the lung microbiome has identified wide microbial population variations in the lungs of healthy individuals. "What is a healthy microbiome? We don't know yet really, we're just scratching the surface," says Harkin.

Meanwhile, microbiome sequencing and data analysis continue getting simpler and cheaper, so that even undergraduates and nonscientists can now study microfauna almost as easily as fauna. "You'd be amazed at how many people are just starting to do this with ribosomal RNA. It's a little bit different than what you might do with a set of binoculars and a bird book, but they get it," says Eisen.

Featured Participants

454 Life Sciences/Roche


Life Technologies

Stanford University
University of California, Davis


MetaHIT Project

Note: Readers can find out more about the companies and organizations listed by accessing their sites on the World Wide Web (WWW). If the listed organization does not have a site on the WWW or if it is under construction, we have substituted its main telephone number. Every effort has been made to ensure the accuracy of this information. Inclusion of companies in his article does not indicate endorsement by either AAAS or Science, nor is it meant to imply that their products or services are superior to those of other companies.

Alan Dove is a science writer and editor based in Massachusetts.

DOI: 10.1126/science.opms.p1300075

This article was published as a special advertising feature in the 10 May 2013 issue of Science magazine.

For PDF version

New Products: Microbiomics: The Germ Theory of Everything



The Accel-NGS DNA Library Kit is the first product in a new line of kits for next generation sequencing (NGS) sample preparation. NGS users can now produce polymerase chain reaction (PCR)-free libraries with as little as 5 ng of input DNA. The highly efficient Swift adaptation technology eliminates the need for PCR, thereby minimizing base composition bias and fidelity issues while reducing the input requirement. The unique, two-step adaptation process also reduces adapter dimer formation to maximize sequencing output. Unlike other kits, the Accel-NGS DNA Library Kit does not require intact double-stranded DNA, making it ideal for FFPE and damaged samples. The Accel-NGS DNA Library Kit protocol is fast, requiring only 75 minutes start-to-finish, and consists of five easy steps, two of which are bead-based separations that eliminate the need for time-consuming, electrophoretic gel-based size selection. In addition, the streamlined Accel-NGS workflow can be readily automated.

Swift Biosciences
For info: For info: 734-330-2568


The NEBNext Microbiome DNA Enrichment Kit uses a novel method to separate microbial DNA from human host DNA, thereby reducing the prohibitively high cost of sequencing microbiome DNA to a practical level. Microbiome samples are commonly dominated by host DNA (up to 99%). This complicates genetic analyses of these samples, particularly total microbiome DNA sequencing. Since only a small percentage of sequencing reads pertain to the microbes of interest, obtaining sufficient sequence coverage of the microbiome DNA becomes cost-prohibitive or even technically infeasible. The NEBNext Microbiome DNA Enrichment Kit utilizes the MBD2-Fc protein, which binds to CpG-methylated DNA (including human genomic DNA) with very high specificity. The MDB2-Fc protein is attached to Protein A Magnetic Beads, enabling quick and easy removal of the contaminating host DNA in about 30 minutes. The microbial DNA-enriched sample is then ready to be processed for multiple downstream applications, including next generation sequencing, real-time polymerase chain reaction (qPCR), and endpoint PCR.

New England Biolabs
For info: 800-632-5227



The NEXTflex 16S V4 Amplicon-Seq Kit has been developed to simplify bacterial metagenomics studies using Illumina HiSeq and MiSeq platforms. This kit allows users to go from sample to sequence in two hours, making it the fastest library prep kit available. Using specialized NEXTflex primers that target the V4 region of the 16S subunit, a single polymerase chain reaction amplification simultaneously ligates the necessary sequencing and barcoded region for multiplexing. The entire workflow requires only one clean-up step, maximizing recovery. Ideal for studies of microbiome community composition or comparative metagenomics, the NEXTflex V4 Amplicon-Seq Kit is available with up to 48 barcodes. Higher degrees of multiplexing are available on a custom basis.

Bioo Scientific
For info: 888-208-2246



A new comprehensive suite of SeqCap EZ Reagent kits are available for application in single or multiplex target enrichment experiments prior to DNA Sequencing. Designed to maximize customer convenience and streamline the DNA preparation workflow, the new kits provide customers with an all-inclusive reagent solution for use in their target enrichment experiments such as sequencing the whole exome or large target genomic regions. The new SeqCap EZ Reagent kits are also optimized and validated for use with the SeqCap EZ Library probe pools. The multiplexing capabilities of SeqCap EZ Library allow researchers to manage multiple samples per sequencing run enabling high throughput research laboratories to capitalize on cost-effective and efficient workflow methods for next generation sequencing. The SeqCap EZ Reagent Kits include the accessory, oligonucleotides, hybridization, and wash kits, all conveniently configured in a 24 reaction pack.

For info: 877-646-2534



Two new additions to the extensive range of automated pipetting systems are uniquely tailored for polymerase chain reaction setup and nucleic acid purification, yet retain the flexibility that allows their use as open systems for diverse automated liquid handling applications. The new epMotion P5073 and M5073 workstations automate and simplify what are traditionally complex, labor-intensive pipetting tasks, saving time and improving the reliability and reproducibility of results. The epMotion M5073 fully automates the process of DNA purification, providing reproducibility with high yield and purity. Setup time is short and elution volumes as low as 25 μL for high concentrations can be used. Eppendorf's MagSep reagent kits are specifically designed for use with the epMotion M5073. These ready-to-use reagents, supplied in a specific tray, eliminate the need for manual buffer handling and require only room temperature storage.

For info: 800-645-3050

Electronically submit your new product description or product literature information! Go to for more information.

Newly offered instrumentation, apparatus, and laboratory materials of interest to researchers in all disciplines in academic, industrial, and governmental organizations are featured in this space. Emphasis is given to purpose, chief characteristics, and availability of products and materials. Endorsement by Science or AAAS of any products or materials mentioned is not implied. Additional information may be obtained from the manufacturer or supplier.

Look for these Upcoming Articles

Proteomics: Maldi Imaging — May 31
Data Management: Cloud-Based — June 14
Separation Techniques — July 12