Note to users. If you're seeing this message, it means that your browser cannot find this page's style/presentation instructions -- or possibly that you are using a browser that does not support current Web standards. Find out more about why this message is appearing, and what you can do to make your experience of our site the best it can be.
Oxford Global

Site Tools

  • AAAS
  • Subscribe
  • Feedback

Site Search

Search Advanced

This Special Advertisising Section is brought to you by AAAS OPMS

Drug Discovery and Biotechnology Trends – Bioinformatics: Feeling the Forward Momentum
A growing list of novel tools—including hardware and software—creates new power for exploring applied and theoretical life sciences. This article surveys customized statistical software, souped-up sequencing tools, grid computing, and much more.
by Mike May and Gary Heebner


ADVERTISERS

AAAS
American Association for the Advancement of Science, world’s largest general scientific association and
publisher of Science
202-326-7061
www.aaas.org

Affymetrix
DNA microarrays, based on the principles of semiconductor technology
408-731-5000
www.affymetrix.com

Virginia Commonwealth University, Center for the Study of Biological Complexity
university center of excellence for life science research
804-828-5600
www.vcu.edu/csbc


IN THIS ISSUE:
Cluster of computers
Preparing for petaflops
Grid computing
Data mining
Sequencing software
Proteomics
Drug discovery
The companies in this article were selected at random. Their inclusion in this article does not indicate endorsement by either AAAS or Science, nor is it meant to imply that their products or services are superior to those of other companies.

The past few years of press coverage for life science research frequently mentioned data, heaps of data, but even more intriguing news lies ahead. A crucial challenge in the future of bioinformatics involves putting that data to work. Now life scientists hope to plan large experiments, collect loads of data, analyze it, compare data between experiments, and eventually combine all of that information to improve basic theories, biotechnology, and medicine. The professionals interviewed for this article believe that bioinformatics will soon deliver on those desires.

For some scientists, today’s biology reminds them of physics in the last half of the 20th century, when high throughput data and theory fed complex simulations that opened up new arenas, such as the advances in particle physics. “Biology is now at that same threshold,” says Ajay Royyuru, who manages the Computational Biology Center at IBM Life Sciences. He feels that the increasing access to high throughput data plus rapidly developing theory behind pathways, systems biology, and so on will come together in large-scale in silico models. According to Royyuru, this combination of knowledge will generate an enormous impact.

To realize this effect, though, life scientists need tools to make data, keep track of it, run it in models, and more. Royyuru says, “We are on the verge of making phenomenal improvements in our understanding of biology, and this progress is happening because of combining data, theory, and modeling.” A series of new techniques and tools described here will help all biologists feel this forward momentum in bioinformatics.

Computing in Clusters
Computing plays two fundamental roles in bioinformatics and computational biology, according to Royyuru. First, computers participate in data analysis, ranging from accessing high throughput data and sequencing single nucleotide polymorphisms to analyzing microarrays and experiments in proteomics. “The biggest challenge,” says Royyuru, “is reducing the dimensionality of these data so that scientists can understand them.” To do that, computing should combine data mining with biological insight. Second, computers can run in silico models that test biological theories. “You can ask questions like: What happens to a cell system when you knock out a certain gene or administer a specific drug?” Royyuru explains. “In silico modeling must capture homeostatic behavior of cells, systems, and organisms. Then, scientists can explore what happens during disease or other perturbations.” Reaching higher levels of modeling demands increased computing capabilities, and all major information technology vendors—including Apple, IBM, Hewlett-Packard, and Sun Microsystems—offer solutions that address this market.

One computing advance relies on tightly coupled clusters of processors. In other words, many processors can be connected—with high levels of communication between them—to work as a team, such as the IBM P series of supercomputers and the Sun Fire 15K server with Sun Cluster software. Tightly coupled clusters work very well for many applications, including simulating molecular biology, chemical kinetics, protein folding, and so on.

The question for the future is: How many processors can be packed into a reasonable space? Today’s machines use a few thousand processors to churn out a few tens of teraflops, or trillion floating point operations per second (flops). “You can do a good amount of science on these,” says Royyuru, “but there is more science to get to that is not possible with these machines.” He thinks it will take petaflops (1,000 trillion flops) machines to simulate the behavior of proteins, for example. To get that kind of power, computer scientists could simply put more boxes in a room or rethink how they put together processors.

Royyuru points out that putting more boxes in a room will eventually become increasingly difficult, because even 2,000 processors in a traditional design occupy the floor space of a basketball court. Scaling up to petaflops machines would grow too large and take too much power. Instead, IBM looked for new approaches to computer architecture. In project Blue Gene, for example, investigators at IBM and the Lawrence Livermore National Laboratory converged on so-called cellular architecture, which mimics the way biological structures are composed. Blue Gene should crank out 6 teraflops with a single rack of equipment—about one thousand processors. This year, scientists at IBM’s Watson Research Laboratory hope to put together half a rack of Blue Gene chips. “We learn as we go along,” Royyuru says, “and we intend to make this hardware relevant to biological research.”

Getting on the Grid
The wide collection of ‘omics’—from genomics to, well, who knows what—generated very high expectations, too high according to Liz From, a global life science business strategist at Sun Microsystems. “After you get past the hype,” From says, “you must pay the piper. With so much data and so many different kinds of data, trying to draw conclusions poses a major challenge. This is where information technology can play a significant role by transforming that data into knowledge that will drive new advancements in the industry.” One solution for handling and analyzing so much disparate data comes from grid computing, which connects many computers within and between institutions with a software system, such as the Sun One Grid Engine.

Research at Sun shows the need for grids. Loralyn Mears, Sun’s market segment manager for the life sciences, says, “We did a study and found that the average biotech company has five months worth of projects, and most companies had to run those serially. Worse yet, the average company used only 22 percent to 25 percent of the available compute cycles.” With Sun’s Grid Engine technology, Mears says, those companies could use 99 percent of the available cycles, without buying one more computer or taking up any additional space. She adds that the Sun One Grid Engine software scales up easily from a dozen computers to a thousand.

“If someone is reasonably information-technology savvy,” Mears says, “he or she can configure a grid in one day with the Sun One Grid Engine. And it can connect machines from Sun, IBM, Silicon Graphics, most anything.” In addition, an open-source version of this software is free. Nevertheless, a company that wants to assign policy restrictions to a grid will need the Enterprise edition, which must be purchased. So far, about eight thousand installations of Sun’s software run grids around the world, and From says, “Life scientists are the biggest consumer group.” For example, Oxford GlycoSciences struggled with BLAST searches that took up to three months to complete, but after installing Sun One Grid Engine software the searches now take about a week.

Helping Data Make a Difference
As the collections of data continue to grow and access to them gets easier, scientists start asking new questions. Catherine DeSesa, senior analyst at SPSS, says, “Instead of looking at one tiny process in a cell, biologists ask what’s out there.” To direct that question to all of the existing databases, life scientists need new kinds of software, which comes from companies like Alpha Innotech, PerkinElmer, Silicon Genetics, and SPSS.

For example, DeSesa says, “Clementine is a visual data mining environment which contains a suite of tools that allow the user to access multiple data sources and data types, do transforms and other data preprocessing, utilize a variety of exploratory visualization tools, apply one or more predictive modeling methods and finally, deploy an entire solution.” Today’s data, though, often involves more than numbers. For example, a scientist might apply data mining to entire articles, which LexiQuest Mine, also from SPSS, can do. “You can look at the content, or the important concepts, in large collections of journal articles, efficiently find the concepts that matter to you, structure that text based data, and then link that information with your data set. Scientists need to use all of the available data, and much of that data is in the form of unstructured text,” DeSesa says.

DeSesa and her colleagues help scientists put data to use. She says, “Finally, when we are satisfied with our data mining results, we ask: How do we get this new knowledge into our daily lives?” The final phase of data mining is the deployment of the analysis. SPSS helps customers through this general process, even creating pilot projects to show customers how to use the tools in their specific domains.

Software for Sequencing
Bioinformatics software plays a variety of roles in the general field of sequencing, including assembling genomes and identifying genes and regulatory elements. Software also helps investigators analyze similarities and differences between genes and organisms. Several dozen companies—including DNASTAR, InforMax, and Nonlinear Dynamics—create software for manipulating genes and DNA sequences.

DNASTAR, for instance, makes the Lasergene suite. According to John Schroeder, vice president of research and development at DNASTAR, this software performs many tasks: sequence assembly and finishing, primer design, gene discovery and annotation, sequence pair and family alignment with phylogeny, restriction site analysis and mapping, and protein structure analysis. “Basically, Lasergene provides a wide range of functionality,” Schroeder says. He adds that more than three thousand research articles mention using this software.

A variety of features makes this package so widely used. First, it works with most of the major file formats. Schroeder says, “Our priority is ease of use, and we want this to be as convenient a package as possible.” For example, this package lets a user drag-and-drop whole folders of sequences.

DNASTAR offers other products, too. GenVision—a DNASTAR plug-in for Adobe Illustrator—helps scientists visualize expression data, functional comparisons, genome presentations, and more. StarBlast, on the other hand, stores data and can be used to publish a sequence online.

Integrated Analysis
Proteomics also requires new approaches to bioinformatics. Protein studies often include data from a wide variety of experiments, including mass spectroscopy, protein chips, and two-dimensional gel electrophoresis. As a result, scientists need tools that keep track of data and relate one data set to another. Companies like Amersham Biosciences, Bio-Rad, and Oxford GlycoSciences offer those very products.

“No one company covers all aspects of proteomics,” says John Schneider, vice president of marketing at Amersham Biosciences. Protein separation by two-dimensional difference gel electrophoresis (DIGE) is one of Amersham Biosciences’s specialties. It’s Ettan DIGE system multiplexes dye molecules to study more than one protein sample in a single gel and detects changes of as little as 10 percent in protein abundance levels. The DeCyder software package guides a user through a DIGE experiment and its analysis. Schneider says, “DeCyder is a very successful, leading edge informatics application for 2D processing, which means image processing, statistical accuracy, and experiment support.”

The increasing scale of experiments also demands advanced infrastructure to keep up with advanced approaches to proteomics. The Scierra Laboratory Workflow Systems from Amersham Biosciences provide data collection and analysis from various applications, including sequencing, microarrays, and proteomics. Schneider says, “This system tracks all of the components involved in manufacturing data, and it is all integrated on one platform.”

Integration also impacts the bioinformatics behind DNA and protein arrays. For example, Iobion Informatics makes GeneTraffic software for microarray analysis. Jason Goncalves, cofounder and chief scientific officer at Iobion, says essentially every step in microarrays—from designing them to analyzing them—requires informatics. He also pointed out ongoing work that integrates expression data with other types of data, such as related literature. For example, PathwayAssist, a Stratagene product, supplies literature-mining tools that enable scientists to extract functional associations from the literature.

The Future of Pharmaceuticals
Drug discovery companies, like Celera Genomics and Millennium, use genomic and proteomic data for developing new pharmaceuticals. Ellen Beasley, director of bioinformatics at Celera, says, “Bioinformatics is especially useful in early stages of target identification and the drug-target validation process.”

In general, Celera takes three approaches to drug discovery. One involves proteomics. For example, Beasley says, “We look for differential expression of cell surface proteins related to cancer.” Celera scientists also use genetics for drug discovery. Investigators from Celera and Celera Diagnostics, for instance, recently resequenced 80 percent to 90 percent of the human genes in 39 individuals and identified over 40,000 new functional single nucleotide polymorphisms (SNPs). “This unique SNP resource then becomes the basis for seeking associations between genes and disease,” Beasley says. “The genes identified in these association studies may also be therapeutic targets.” Third, Celera acquired Axys Pharmaceuticals to develop drugs against protease targets.

Overall, reaching the full potential of bioinformatics demands collaborations. Companies must work together—teaming up directly or simply making their tools compatible across corporate lines—to give research scientists all of the experimental and analytical power available. Such goodwill could lead to prosperity for companies and researchers alike.

Mike May is a freelance writer based in Madison, Indiana, U.S.A. Gary Heebner is a marketing consultant serving the scientific industry, based in Foristell, Missouri, U.S.A.

WEBLINKS
ADVERTISERS

AAAS
American Association for the Advancement of Science, world’s largest general scientific association and
publisher of Science
202-326-7061
www.aaas.org

Affymetrix
DNA microarrays, based on the principles of semiconductor technology
408-731-5000
www.affymetrix.com

Virginia Commonwealth University, Center for the Study of Biological Complexity
university center of excellence for life science research
804-828-5600
www.vcu.edu/csbc

FEATURED COMPANIES

Alpha Innotech Corporation
scientific software
www.alphainnotech.com

Amersham Biosciences
data management software
www.amershambiosciences.com

Apple Computer, Inc.
computers and operating systems
www.apple.com

Axys Pharmaceuticals
drug discovery
www.axyspharm.com

Bio-Rad Laboratories
data management software
www.bio-rad.com

Celera Diagnostics
diagnostic products
www.celeradiagnostics.com

Celera Genomics
drug discovery
www.celera.com

DNASTAR
genomics software
www.dnastar.com

Hewlett-Packard
computers and operating systems
www.hp.com

IBM Life Sciences
computers and operating systems
www.ibm.com

InforMax, Inc.
genomics software
www.informaxinc.com

Iobion Informatics
genomics software
www.iobion.com

Lawrence Livermore National Laboratory
government research facility
www.llnl.gov

Millennium Pharmaceuticals, Inc.
drug discovery
www.mlnm.com

Nonlinear Dynamics, Ltd.
genomics software
www.nonlinear.com

Oxford GlycoSciences
drug discovery
www.ogs.com

PerkinElmer Life and Analytical Sciences
scientific software
www.las.perkinelmer.com

Silicon Genetics
scientific software
www.silicongenetics.com

SPSS Science
scientific software
www.spssscience.com

Sun Microsystems
computers and operating systems
www.sun.com

Note: Readers can find out more about the companies and organizations listed by accessing their sites on the World Wide Web (WWW). If the listed organization does not have a site on the WWW or if it is under construction, we have substituted its main telephone number. Every effort has been made to ensure the accuracy of this information. The companies and organizations in this article were selected at random. Their inclusion in this article does not indicate endorsement by either AAAS or Science nor is it meant to imply that their products or services are superior to those of other companies.

This article was published
as a special advertising section
in the 26 September 2003 issue of Science



ADVERTISEMENT
Click Me!

ADVERTISEMENT
Click Me!

To Advertise     Find Products


Science. ISSN 0036-8075 (print), 1095-9203 (online)