Jump to: Page Content, Section Navigation, Site Navigation, Site Search, Account Information, or Site Tools.
Guest Alerts | Access Rights | My Account | Sign In
|
|
This Special Advertisising Section is brought to you by AAAS OPMS
» The techniques designed to unveil the sequence of an organism's genome promised a new world of understanding for biologists. As time passed, the flow of data grew. Data started piling up, like water building into a wave. The crest of data rose higher than ever before, and kept growing. A promise of knowledge turned into a ongoing tidal wave—one that continues to flood labs with data around the world. But data alone cannot change the world of biological knowledge. Marty Rosenberg, vice president of research and development for Promega Corporation, said, "We're accumulating phonebooks of information about the coding information that makes up various organisms. Now, how do we use it?" » Despite the already data-swamped computers in labs, even more data lie ahead. Richard Durbin, head of the Informatics Division and deputy director of the Wellcome Trust Sanger Institute, said, "Sequencing is not over with. This year other vertebrate genomes are becoming available." He added, "We're getting large volumes of genomic data and that's very computer demanding. Sequence comparison is also computer demanding." To keep up with this pace of data creation, Durbin mentioned continuing improvements in computational resources. These include a new generation of algorithms that should help researchers deal with whole genome data sets. Nevertheless, more improvements could be made in software. Durbin said, "We've developed methods to find things—gene finding—and methods for comparing things once they're found, like BLAST, which lets you compare two sequences. Now, we want to integrate those two processes." Moreover, this field will demand even more integration in the future. With growing numbers of sequences from different organisms, scientists could use these data to explore the biological function of genes. Nevertheless, Durbin said, "It's still hard to use comparative data effectively." In addition, scientists already want to explore the importance of variation in the genome between humans. This will involve even larger data sets. To support these new endeavors, scientists need powerful and sophisticated software programs and supporting computer hardware to manage the raw data and dredge up the valuable information. Many fields of science—from genetics to drug discovery—already call on bioinformatics to capture valuable information in the flood of genomic data. Consequently, the curse of too much data already promises to be the blessing that researchers desired all along—more knowledge and better health.
SPEEDING UP THE SEQUENCING This data collecting started in 1965 with the sequencing of an 80-base-pair yeast tRNA. Around 1970 the discovery of restriction enzymes and DNA polymerases made the sequencing of DNA possible. New England Biolabs, Promega, Roche Molecular Biochemicals, and other companies quickly produced products that opened this field to investigators who were not trained in molecular biology. For the first time, well-defined DNA fragments could be isolated from larger molecules, which enabled the sequencing of the genome of a bacteriophage, 5.4 kilobases in size, in 1976. In 1977, scientists learned to sequence DNA in two new and relatively simple ways: Fred Sanger's dideoxy chain termination method and Allan Maxam and Walter Gilbert's chemical degradation method. In 1981, the Sanger dideoxy method revealed the sequence of human mitochondrial DNA, which consists of 16,500 bases. In 1983, scientists used the Maxam and Gilbert chemical degradation method to sequence DNA from the bacteriophage T7, which included 40 kilobases. Today, scientists also seek enhanced speed in looking for variations in genomes, especially single nucleotide polymorphisms, or SNPs. For example, Promega's READIT SNP Genotyping System is used to interrogate SNPs. Marty Rosenberg, vice president of research and development, said, "As we were developing the assay to detect polymorphisms, we observed a very high degree of accuracy." The accuracy is so high that Rosenberg and his colleagues hope to develop takeoffs on the READIT technology for clinical applications. In addition, this assay can perform quantitative analysis. Promega's AluQuant Human DNA Quantitation System is based upon the READIT technology. This assay provides human specific DNA quantitation in the presence of contaminating DNA, which is crucial for forensic DNA analysis. Forensic scientists also use Promega's DNA IQ System to purify genomic and mitochondrial DNA. Rosenberg said, "The DNA IQ System is being used by life scientists to purify all of the DNA from small samples, or a defined amount of DNA in samples where DNA is in excess. The amount of DNA isolated from a sample can be tailored to the specific downstream application." The product was used for DNA isolation at Ground Zero after the terrorist attacks last September.
ADVANCES IN AUTOMATION Today's systems related to DNA sequencing often come in automated work stations, like the products created by Bio-Tek Instruments, Hamilton Company, and Packard BioScience Company. For example, Bio-Tek Instruments specializes in automated work stations for research applications. This company provides microplate readers, washers, data reduction software, and automated systems for liquid handling that are used in genomic research. Beckman Coulter, Genomic Solutions, and Zymark Corporation push automation ahead even further by creating robotic systems that perform many of the upstream and downstream processing tasks that are required for DNA sequencing. For example, Beckman Coulter's Biomek 2000 Laboratory Automation Workstation and Biomek FX perform tests that range from the processing of samples for DNA sequencing to ELISA assays. These modular systems move the sample from one stage to the next with a robotic arm. Overall, these systems can automate virtually any liquid handling procedure—pipetting, diluting, dispensing. According to Jim Osborne, vice president of advanced technology, "These devices were designed with high throughput and drug discovery in mind." He added, "The strategic focus of Beckman Coulter is to simplify and automate processes from drug discovery to patient testing." Osborne pointed out repeatability as one fundamental benefit of automation. He said, "You want to eliminate as much possibility of error in the original data as you can."
ASSAYS AT YOUR REQUEST The combination of so many assays and automated systems to run them should enhance the predictive side of many research areas, including drug discovery. Clifford Baron, director of marketing for global services and solutions at Applied Biosystems, said, "With the completion of the genome, bioinformatics is really maturing from a somewhat speculative arena to a very applied arena, where it can be very robust and predictive." Applied Biosystems's ABI PRISM 7900 Sequence Detection System, for example, will help scientists feed their bioinformatics software. This device performs real-time PCR and measures fluorescence in several colors. It also can be used with many off-the-shelf assays from Assays-On-Demand. Spier said, "Now we provide a more complete solution, and our e-commerce website will provide the list of genes of interest and you can select and order assays that are guaranteed to work." High throughput devices from Applied Biosystems will push ahead even further this June when it introduces a genotyping array system. In this system, DNA will hybridize to beads at the end of an optical fiber. This system will provide 40-fold replication in searching for any SNP of interest—all in a single operation on a 96-well plate.
LIFE FROM DEATH Promega created a couple of products targeted specifically at cell viability. Apo-ONE Homogeneous Caspase-3/7 Assay searches for apoptosis in cultured cells. Investigators can apply a drug candidate to a culture and then use Apo-ONE to see if the cells die. According to Rosenberg, "Apo-ONE is sensitive down to at most a hundred cells. We're developing a new assay that will take that sensitivity down to 1 to 10 cells." Promega also provides CellTiter-Glo Luminescent Cell Viability Assay to assess a potential drug's effects. This one-step assay measures ATP levels by way of the enzyme luciferase. Rosenberg explained: "When cells die, the first thing that happens is that they deplete their ATP. In this assay, as cells die the lights go out." Promega also uses bioluminescence as the indicator in its Dual-Glo Luciferase Assay System. This assay helps researchers track the expression of two different genes in the same experiment. As a result, you can simultaneously track an experimental reporter—or a gene that you are trying to stimulate—and a control to normalize your results. Such an assay should reduce variability in experiments, thereby providing more reliable data.
CALCULATING THE COMPARISONS The EMBL Nucleotide Sequence Database consists of DNA and RNA sequences that come from individual researchers, genome sequencing projects, and patent applications. EMBL, GenBank, and DDBJ collaborate on this project. Each group collects a portion of the total sequence data reported worldwide, and they exchange data daily. Anyone working in this field recognizes the acronym BLAST, which stands for the Basic Local Alignment Search Tool. Madden said that BLAST "calculates similarity for biological sequences." He added: "It uses some smart shortcuts to get to its answer faster and also uses statistical theory to calculate an expected value—basically a false positive rate—that lets the user know how surprising—and hence interesting—the results are." PSI-BLAST can also create a profile while searching one database, and then use the same profile to search another database. Madden said that can produce more sensitive results and find more distant relationships. Right now, sequence similarity programs face several challenges. Madden said, "The databases are doubling in size every year or less, so speed will always be an issue, and there will always be the need to find faster ways to get the results." As the number of known sequences increases and scientists increase the number of entire genomes available, these programs must grow even more sensitive. In other words, searches must produce fewer false positives in looking for matches. Madden added, "It is also becoming more important to present the results in a manner that helps the user understand the biology better. Displaying the results in the MapViewer at the NCBI is one way this problem is being addressed." To keep this field moving ahead, a variety of companies create software for searches. For example, Accelrys and Textco offer suites of programs that include BLAST searching and more.
SOFTWARE FOR SEQUENCING In today's laboratory, however, sequencing can often be just the first step. Consequently, Lasergene also performs additional tasks. For example, the Protean application predicts and displays the secondary structure of a protein that a sequence would generate. Lasergene's MapDraw supplies restriction mapping and vector drawing. Burland said, "You can add the whole E. coli genome—4.6 megabases—to MapDraw and get a restriction map of it, without any special hardware." This year, DNASTAR is working on a new sequence assembly product that will handle more sequences and faster. For example, conventional assembly software ran in exponential time, meaning that twice as much data demanded four times as much computer run time; but Burland says the new system will run in linear time, so that twice the data runs in just twice the time. Burland said, "This application will assemble 100,000 sequences on a desktop computer in just two hours." He added, "A lot of genome assemblies out there are not perfectly correct, so investigators could use this assembler for validation or for independent assessment of an assembly."
THE COMPUTING CHALLENGES Investigators at Compaq believe that the challenges for computing from today's life sciences demand tools created by integrated teams. For example, the Compaq Bioinformatics Solutions Center includes biophysicists, computational scientists, and others to explore new approaches to data issues. Ty Rabe, director of high performance technical computing solutions at Compaq, said, "One area we work in is bioinformatic applications themselves—how to design, optimize, and run them. We also study how to build the architecture of high performance computing environments and then integrate it with the workloads. In addition, we design data management environments, which are particularly important in bioinformatics." In general, investigators at the Bioinformatics Solutions Center seek repeatable solutions that could benefit a broad base of customers. Rabe said, "Typically we find that if we can solve an overall problem for Celera or the Sanger Institute then we can apply it to many other customers." For the most part, these solutions involve computing infrastructure. Rabe indicated that more complicated work lies ahead. He said, "It's very clear that in genomics and related applications that we are really at the front end of this—the simple end in terms of complexity. The genome is the lowest level of complexity. The next step is looking at protein structure and understanding how gene-protein interactions might lead to new drugs, and the protein problem is much more complicated than the gene problem." To meet the computational demands required by proteomics, the computing power, according to Rabe, must increase by at least an order of magnitude. He added: "From proteins, we move to cells, tissues, and ultimately to organs and whole organisms. I don't think people know what it will take to simulate whole organisms." In the past, the computing industry generally kept pace with Moore's Law, which states that computing performance doubles every 18 months. Now, Rabe says that fundamental features of nature—including the physical size of molecules and the speed of light—create problems for getting more computing power by making smaller components. Consequently, the next step in power could come from massively parallel processing, which is simply using many computers to work as a team on a single problem—all at once. Rabe indicated that prototype systems that use hundreds of thousands of processors simultaneously should be out in a few years. In looking ahead, he said, "The genomics industry will be as important in a few decades as computers are now."
PUSHING AHEAD, TODAY Lincoln takes a broader look at integration, thinking of it as a stool with three legs. Data integration is the first leg. The second leg is genomic integration, which Lincoln called "the actual knowledge of why you're putting that data together and what you'll do with them. For us it's all about taking the tools of the genome and bringing them into real science so that we can use the genome to solve real problems." The third leg of integration, according to Lincoln, is wet lab integration. He said, "No matter what a bioinformatic system will do for you, you're still talking about a computerized system that does the best it can with the caveats and complexity of the data, but the data contradict themselves sometimes. We must tie the tools of bioinformatics into day-to-day wet lab processes so you can test the hypotheses."
WORKING WITH COMPLEXITY To work with these large sets of data, Molecular Mining provides GeneLinker. The 2.0 Gold version of this software came out in January of this year, and it helps scientists explore data sets from expression analysis. It uses clustering techniques that arrange expression data into functional groups. When describing clustering, Somogyi said, "You're trying to find patterns of similarity. You can imagine there's a space of possible expression patterns but biology uses only a small volume of that, and those are the clusters. Often within these gene clusters, you find genes from related pathways." The next version of GeneLinker, Somogyi says, will allow scientists to start making predictions. In other words, this software analyzes expression data and then picks out the gene sets that seem most predictive of, say, a given disease or drug response. In this way, it helps researchers plan future experimental steps. Even more exciting work is on the horizon. Somogyi said, "Once we get more time series data, we can extensively reverse-engineer gene regulatory networks. But we need larger data sets to validate that further. While the genomic data flood may provide crippling challenges for some analytic approaches, the performance of our reverse-engineering methods improves with increasing depth and complexity of the datasets our partners provide. Of course, judicious experimental design is a prerequisite here also." The reverse-engineering approach could reveal many pathways by merely looking at their outputs. From exploration of these in silico models, scientists can more thoughtfully select the wet lab experiments to perform. Advanced bioinformatic software is also being produced by many other companies, including BioTools.
TAKING ADVANTAGE OF VARIATION Third Wave Technologies applies its Invader to this task. Invader products detect and quantify DNA and RNA. Lance Fors, Third Wave's chief executive officer, said, "Invader is the only technology that doesn't require PCR as part of the sample preparation." He added, "It's precise enough that you can throw it into a large haystack of DNA and detect—with 99.9 percent accuracy—any sequence or sequence variation." This company already offers Invader assays for more than 100,000 unique SNPs. Users of this system receive microtiter plates—with 96 or 384 wells—and just add their samples. The results can be read with a fluorescent microtiter plate reader, which many laboratories already have. Fors said, "Part of our focus is enabling an understanding of the whole genome and its association with disease, and quickly focusing that into pharmacogenetics." Through these combinations of technology, today's phonebooks of data will grow into tomorrow's advances. Success involves collecting better data and finding ways to make use of it, and fast. As scientists discovered in the early stages of genomics, it takes an interdisciplinary team to crack codes from genetics to proteins and beyond. Still, the rewards loom large.
Note: Readers can find out more about the companies and organizations listed by accessing their sites on the World Wide Web (WWW). If the listed organization does not have a site on the WWW or if it is under construction, we have substituted its main telephone number. Every effort has been made to ensure the accuracy of this information. The companies and organizations in this article were selected at random. Their inclusion in this article does not indicate endorsement by either AAAS or Science nor is it meant to imply that their products or services are superior to those of other companies.
|
|||||||||||||
Science. ISSN 0036-8075 (print), 1095-9203 (online)