Although my initial training was in biology, I have always been interested in computers. No other technology allowed the flexibility and capability to manipulate large amounts of genetic sequence data. My professional interests have permitted me to navigate between the worlds of bench research and computer programming. Because of this, I'm well equipped not only to understand the science behind the numbers crunch but also to relate the numbers to the biology.
Getting the Bioinformatics Bug
I first became aware of bioinformatics while in graduate school at the University of Colorado, Boulder, studying molecular, cellular, and developmental biology. This was 1993, and all of a sudden everyone was doing basic local alignment search tool (BLAST) searches. At the time, I was working on a tgf-beta signaling molecule (an important player in cell-cell communication) that I'd cloned from the nematode Caenorhabditis elegans. Most of my graduate work involved restriction digests and microscopes.
I had zero computer experience; in fact, I could barely type, but a postdoc in the lab helped me do a BLAST search using my gene's protein sequence as a query. It was like magic. I remember looking at the report and realizing that there were all of these similar genes in all of these different animals and thinking: "Maybe a careful reading of the papers associated with the sequences most similar to mine will tell me something about my own gene." It was a watershed moment for me, and it really paid off careerwise.
I was an instant bioinformatics convert, but one with no computer skills. What opened the door for me was phylogenetics, the field of biology that deals with identifying and understanding the relationships among the many different kinds of life. My tgf-beta gene was part of a big family of genes, and I began making phylogenetic trees in order to better understand how it was related to the other family members.
Learning to use publicly available phylogenetic tools, such as PHYLIP, got me accustomed to working with computers and also sparked my curiosity as to how the programs worked. Honestly, though, it was laziness that got me into programming. Making the trees involved a lot of hand editing of the data to get it into the right format, so I began writing scripts--short, simple programs--to do the editing for me.
I recommend a postdoc in a computational biology program for any bench scientist who is serious about pursuing a bioinformatics career. It's a great opportunity to gain valuable skills; you just can't let fear of looking stupid hold you back.
When it came time to find a postdoc, I was fortunate to land a position as an annotator at the Genome Sequencing Center at Washington University in St. Louis, Missouri, in 1996. This proved a huge break for me. My fellow postdocs were mathematicians, computer scientists, and engineers. It was an awesome scientific cross-cultural opportunity. And frankly, it was also terrifying. I could barely even write scripts, and here I was among people whose idea of science was a blackboard filled with mathematical symbols. It was daunting, but I swallowed my pride and started asking questions, lots of them. I soon discovered that I had a lot to contribute, too. As it turned out, many of my colleagues were as confused by biology as I was by Bayesian statistics.
The Real World of Commercial Software
In 1999 I left Washington University and joined Celera Genomics. The switch to industry was a big change. Although Celera very much resembled an academic genome center from a scientific standpoint, software was another matter. At Celera I was thrown head-first into the real world of commercial software and professional programmers in a start-up environment. Every scientist knows that industrial programmers command high salaries. Few realize, however, just how hard they work for those salaries. It was like a professional sporting event in its intensity. I couldn't believe how competitive it was and how much more sophisticated industry was than academia when it came to software. Even though my postdoc proved excellent preparation, I really had to struggle to succeed there.
It turned out well though. I eventually became leader of Annotation Software R&D--the group that wrote much of the software used to annotate and analyze the various genomes sequenced at Celera. In retrospect, I attribute my success in industry to three things: being a better scientist than a programmer, decent management skills, and the scripting language Perl. As a postdoc I decided to learn Perl rather than Java or C programming languages. Mastery of at least one programming language is essential.
Although there is no single best language for bioinformatics, there's something about Perl that really resonates with a lot of biologists. I love it. It's great if you are working with text--which is what most bioinformatics data are--and development times are fast. Time and again my group was able to meet nearly impossible deadlines during the hectic days of the human genome race because Perl is so fast to develop in. Plus, using Perl means that you can use Bioperl, which really helps.
Being a scientist rather than a programmer by training also gave me a big advantage: I understood the science, which was simply a huge career advantage. Management skills also helped. Fortunately, the people I reported to had them, and I did my best to emulate their style. Effective management of a group is a lot trickier than many scientists like to acknowledge; it's a valuable skill.
Moving Full Circle
I've managed to do something that a lot of my biologist friends thought would prove impossible: transit out of industry and back into academia. Although physicists, computer scientists, and engineers do this frequently, it's still a rare event for biologists. I hope it will become less so in the future. My time in industry really has worked for me. I made a lot of great contacts there, but I always wanted to pursue an academic career.
In this regard I'm especially grateful to the Howard Hughes Medical Institute ( HHMI), which graciously offered to support my work at the Berkeley Drosophila Genome Project during my transition. I hope that other funding agencies will follow the HHMI lead in this regard and make more funds available to assist biologists who have gained valuable industry experience. This will make more transits back into academic careers possible.
I think I learned a lot in industry: both how to run a group and how to bring commercial software principles to bear on scientific problems. It has also equipped me with a firsthand view of what industry is really like, an experience few faculty members can draw on when called to give career advice to their students and postdocs.
I'm still a big believer in what bioinformatics has to offer. A lot of people in the field think the Wild West days of bioinformatics are a thing of the past. I don't think that's true. Certainly, the days in which a bench biologist with basic scripting skills could land a programming job in industry are probably gone, but I think opportunities still abound for biologists seeking an academic research career.
These days, bioinformatics can seem like a world of Web pages. If all you ever see of the data is what someone else has made of it, original thoughts, research, and career plans can be hard to come by. It's when you see the data firsthand, and manipulate them with your own software, that you realize that bioinformatics is still a scientific frontier with lots of opportunity. Interested in genes? Learn to program; it will open doors.
Editor's note: Mark Yandell is co-author of BLAST: A Guide to the Basic Local Alignment Search Tool (I. Korf, M. Yandell, J. Bedell, O'Reilly & Associates, 2003; 339 pages. ISBN: 0596002998).
Mark Yandell, Ph.D., is a senior scientist at the Howard Hughes Medical Institute at the University of California, Berkeley. He may be reached at email@example.com.