Jump to: Page Content, Section Navigation, Site Navigation, Site Search, Account Information, or Site Tools.
Guest Alerts | Access Rights | My Account | Sign In
|
|
This Special Advertisising Section is brought to you by AAAS OPMS
The past few years of press coverage for life science research frequently mentioned data, heaps of data, but even more intriguing news lies ahead. A crucial challenge in the future of bioinformatics involves putting that data to work. Now life scientists hope to plan large experiments, collect loads of data, analyze it, compare data between experiments, and eventually combine all of that information to improve basic theories, biotechnology, and medicine. The professionals interviewed for this article believe that bioinformatics will soon deliver on those desires. For some scientists, today’s biology reminds them of physics in the last half of the 20th century, when high throughput data and theory fed complex simulations that opened up new arenas, such as the advances in particle physics. “Biology is now at that same threshold,” says Ajay Royyuru, who manages the Computational Biology Center at IBM Life Sciences. He feels that the increasing access to high throughput data plus rapidly developing theory behind pathways, systems biology, and so on will come together in large-scale in silico models. According to Royyuru, this combination of knowledge will generate an enormous impact. To realize this effect, though, life scientists need tools to make data, keep track of it, run it in models, and more. Royyuru says, “We are on the verge of making phenomenal improvements in our understanding of biology, and this progress is happening because of combining data, theory, and modeling.” A series of new techniques and tools described here will help all biologists feel this forward momentum in bioinformatics.
Computing in Clusters One computing advance relies on tightly coupled clusters of processors. In other words, many processors can be connected—with high levels of communication between them—to work as a team, such as the IBM P series of supercomputers and the Sun Fire 15K server with Sun Cluster software. Tightly coupled clusters work very well for many applications, including simulating molecular biology, chemical kinetics, protein folding, and so on. The question for the future is: How many processors can be packed into a reasonable space? Today’s machines use a few thousand processors to churn out a few tens of teraflops, or trillion floating point operations per second (flops). “You can do a good amount of science on these,” says Royyuru, “but there is more science to get to that is not possible with these machines.” He thinks it will take petaflops (1,000 trillion flops) machines to simulate the behavior of proteins, for example. To get that kind of power, computer scientists could simply put more boxes in a room or rethink how they put together processors. Royyuru points out that putting more boxes in a room will eventually become increasingly difficult, because even 2,000 processors in a traditional design occupy the floor space of a basketball court. Scaling up to petaflops machines would grow too large and take too much power. Instead, IBM looked for new approaches to computer architecture. In project Blue Gene, for example, investigators at IBM and the Lawrence Livermore National Laboratory converged on so-called cellular architecture, which mimics the way biological structures are composed. Blue Gene should crank out 6 teraflops with a single rack of equipment—about one thousand processors. This year, scientists at IBM’s Watson Research Laboratory hope to put together half a rack of Blue Gene chips. “We learn as we go along,” Royyuru says, “and we intend to make this hardware relevant to biological research.”
Getting on the Grid Research at Sun shows the need for grids. Loralyn Mears, Sun’s market segment manager for the life sciences, says, “We did a study and found that the average biotech company has five months worth of projects, and most companies had to run those serially. Worse yet, the average company used only 22 percent to 25 percent of the available compute cycles.” With Sun’s Grid Engine technology, Mears says, those companies could use 99 percent of the available cycles, without buying one more computer or taking up any additional space. She adds that the Sun One Grid Engine software scales up easily from a dozen computers to a thousand. “If someone is reasonably information-technology savvy,” Mears says, “he or she can configure a grid in one day with the Sun One Grid Engine. And it can connect machines from Sun, IBM, Silicon Graphics, most anything.” In addition, an open-source version of this software is free. Nevertheless, a company that wants to assign policy restrictions to a grid will need the Enterprise edition, which must be purchased. So far, about eight thousand installations of Sun’s software run grids around the world, and From says, “Life scientists are the biggest consumer group.” For example, Oxford GlycoSciences struggled with BLAST searches that took up to three months to complete, but after installing Sun One Grid Engine software the searches now take about a week.
Helping Data Make a Difference For example, DeSesa says, “Clementine is a visual data mining environment which contains a suite of tools that allow the user to access multiple data sources and data types, do transforms and other data preprocessing, utilize a variety of exploratory visualization tools, apply one or more predictive modeling methods and finally, deploy an entire solution.” Today’s data, though, often involves more than numbers. For example, a scientist might apply data mining to entire articles, which LexiQuest Mine, also from SPSS, can do. “You can look at the content, or the important concepts, in large collections of journal articles, efficiently find the concepts that matter to you, structure that text based data, and then link that information with your data set. Scientists need to use all of the available data, and much of that data is in the form of unstructured text,” DeSesa says. DeSesa and her colleagues help scientists put data to use. She says, “Finally, when we are satisfied with our data mining results, we ask: How do we get this new knowledge into our daily lives?” The final phase of data mining is the deployment of the analysis. SPSS helps customers through this general process, even creating pilot projects to show customers how to use the tools in their specific domains.
Software for Sequencing DNASTAR, for instance, makes the Lasergene suite. According to John Schroeder, vice president of research and development at DNASTAR, this software performs many tasks: sequence assembly and finishing, primer design, gene discovery and annotation, sequence pair and family alignment with phylogeny, restriction site analysis and mapping, and protein structure analysis. “Basically, Lasergene provides a wide range of functionality,” Schroeder says. He adds that more than three thousand research articles mention using this software. A variety of features makes this package so widely used. First, it works with most of the major file formats. Schroeder says, “Our priority is ease of use, and we want this to be as convenient a package as possible.” For example, this package lets a user drag-and-drop whole folders of sequences. DNASTAR offers other products, too. GenVision—a DNASTAR plug-in for Adobe Illustrator—helps scientists visualize expression data, functional comparisons, genome presentations, and more. StarBlast, on the other hand, stores data and can be used to publish a sequence online.
Integrated Analysis “No one company covers all aspects of proteomics,” says John Schneider, vice president of marketing at Amersham Biosciences. Protein separation by two-dimensional difference gel electrophoresis (DIGE) is one of Amersham Biosciences’s specialties. It’s Ettan DIGE system multiplexes dye molecules to study more than one protein sample in a single gel and detects changes of as little as 10 percent in protein abundance levels. The DeCyder software package guides a user through a DIGE experiment and its analysis. Schneider says, “DeCyder is a very successful, leading edge informatics application for 2D processing, which means image processing, statistical accuracy, and experiment support.” The increasing scale of experiments also demands advanced infrastructure to keep up with advanced approaches to proteomics. The Scierra Laboratory Workflow Systems from Amersham Biosciences provide data collection and analysis from various applications, including sequencing, microarrays, and proteomics. Schneider says, “This system tracks all of the components involved in manufacturing data, and it is all integrated on one platform.” Integration also impacts the bioinformatics behind DNA and protein arrays. For example, Iobion Informatics makes GeneTraffic software for microarray analysis. Jason Goncalves, cofounder and chief scientific officer at Iobion, says essentially every step in microarrays—from designing them to analyzing them—requires informatics. He also pointed out ongoing work that integrates expression data with other types of data, such as related literature. For example, PathwayAssist, a Stratagene product, supplies literature-mining tools that enable scientists to extract functional associations from the literature.
The Future of Pharmaceuticals In general, Celera takes three approaches to drug discovery. One involves proteomics. For example, Beasley says, “We look for differential expression of cell surface proteins related to cancer.” Celera scientists also use genetics for drug discovery. Investigators from Celera and Celera Diagnostics, for instance, recently resequenced 80 percent to 90 percent of the human genes in 39 individuals and identified over 40,000 new functional single nucleotide polymorphisms (SNPs). “This unique SNP resource then becomes the basis for seeking associations between genes and disease,” Beasley says. “The genes identified in these association studies may also be therapeutic targets.” Third, Celera acquired Axys Pharmaceuticals to develop drugs against protease targets. Overall, reaching the full potential of bioinformatics demands collaborations. Companies must work together—teaming up directly or simply making their tools compatible across corporate lines—to give research scientists all of the experimental and analytical power available. Such goodwill could lead to prosperity for companies and researchers alike.
Note: Readers can find out more about the companies and organizations listed by accessing their sites on the World Wide Web (WWW). If the listed organization does not have a site on the WWW or if it is under construction, we have substituted its main telephone number. Every effort has been made to ensure the accuracy of this information. The companies and organizations in this article were selected at random. Their inclusion in this article does not indicate endorsement by either AAAS or Science nor is it meant to imply that their products or services are superior to those of other companies.
|
||||||||||||||||||||||||||||
Science. ISSN 0036-8075 (print), 1095-9203 (online)