Bioinformatics: Computational Biology: From Data to Knowledge


Peer Bork, head of the Comparative Sequence Analysis group at the European Molecular Biology Laboratory (EMBL) in Heidelberg, gives an inside view of biocomputing research at one of Europe's top research institutions and also points out what qualifies young researchers to join the bioinformatics elite.

Bioinformatics is a buzzword for an exploding field that has developed various flavors. Currently, the mainstream applications are the management of heterogeneous biological data and genome analysis. Nucleic acid and protein sequences, as well as three-dimensional structures of proteins, are frequently the objects of study, but gene expression as well as the simulation of pathways and biochemical networks are attracting more and more attention.

Our group works in an area within bioinformatics called computational biology. Despite the flood of molecular data coming from novel high-throughput technologies, research is mostly question-driven and tries to exploit data to gain biological insights. In collaboration with various experimental groups worldwide, we concentrate on comparative genome analysis at various levels. For example, we currently try to predict biochemical pathways by analyzing conservation of genome organization (e.g., the neighborhood of genes) in the more than 30 completely sequenced organisms that are publicly available.

By combining genomic context with homology searches, we hope to identify not only the common but also the distinct biochemical pathways of each organism studied (see, e.g., We also focus on individual genes and proteins and identify functional domains therein (see, e.g., Furthermore, the analysis of data on single-nucleotide polymorphisms, the main cause of difference between human beings, allows the prediction of phenotypic effects of a particular genetic variation.

The future is to integrate all those approaches in order to understand cells as biological systems. Basic research can then lead to hypotheses with immediate impact in medicine and the pharmaceutical industry. The sequencing of the human genome is gearing up all those efforts, and with each novel model organism for which the sequence has been determined, the power of comparative analysis increases. Already now, computational biology leads to many discoveries in a very short time span. In recent years, our group alone has published 30 to 40 research papers annually not only on genomewide patterns, but also on in-depth analysis of particular protein families. The latter often serves as a basis for large-scale analysis, as it is easy to come up with big numbers but hard to relate them to particular biological problems.

In order to tackle all those problems and take up the challenges, enthusiastic people with a background in biology or informatics are most welcome (i.e., are desperately needed); physicists or physicians also have an essential input. The variety of questions, coupled with challenges in methodology ranging from statistics to informatics, requires different viewpoints.

In our group at the EMBL, for example, none of the roughly 15 people has a comparable background. Genetics, biochemistry, organic chemistry, theoretical biology, medicine, agriculture, cybernetics, physics, and mathematics were the major fields from which people found their way into this new kind of research. A formal training in bioinformatics was impossible until very recently, so that students and even postdocs interested in the field were educated on the spot. On the other hand, everybody brought a different view on the same data.

At EMBL, another factor also comes into play: the multinational culture that allows comparative analysis on many topics (mostly applied to the social life). Group members currently come from the United States, Russia, Croatia, Japan, China, Spain, England, Netherlands, and Germany. Thus, it became obvious in discussions that almost all countries "missed the train" in supporting bioinformatics education, so that the demand has been huge in recent years. The pharmaceutical and biotech industries are offering attractive starting salaries to get qualified people on board. Professorships in bioinformatics are currently announced almost biweekly in Germany alone. Despite the current efforts to enable proper training, the shortage of experts will remain a bottleneck for at least another 5 years or so.

Thus, currently most young people learn bioinformatics in a "do-it-yourself" manner, but recently, we have observed an increase in general computing skills, probably because the current generation has had preschool computer contact. For example, among the latest EMBL selection for the international Ph.D. program, there are quite a few students of biology who financed their studies by working as system managers or software developers, i.e., with the right kind of general background.

The future is bright for bioinformaticians: Almost every experimental group will need experts who can link the local data to the world knowledge base and vice versa. Knowledge extraction and data mining will become even more important as experimental biology is moving to a larger scale. It will often be a bioinformatician who generates the right hypothesis that determines the success of an experiment or who speeds up knowledge creation. Perhaps a last caveat: Bioinformatics is not just applying computational methods to biology (informatician's view) or simply programming up a biological question (the biologist's view). Given the many pitfalls and the limited quality of most of the diverse data, success demands experience and a deep understanding of multidisciplinary research.

Further information:

Follow Science Careers

Search Jobs

Enter keywords, locations or job types to start searching for your new science career.

Top articles in Careers