Bioperl is a tool kit for bioinformatics software development. It is a suite of Perl modules designed to parse and manipulate various types of data that one uses in bioinformatics. This includes sequences, annotations, sequence alignments, and parsers for the many different file formats for these data.
Reusable Building Blocks
Bioperl is a tool kit for building applications in Perl as opposed to an already written suite of tools like EMBOSS. Bioperl offers a set of reusable components that can be used like building blocks to construct applications for bioinformatics analysis. Bioperl's goal is to interface with existing software tools. So much effort has been spent to ensure that results from various applications can be processed.
The Bioperl project was started by several course participants at the Virtual School of Natural Sciences BioComputing Division (VSNS-BCD) course at Bielefeld University in Germany in 1996. They spawned the initial Bioperl project, which was freely distributable code under the Perl Artistic license. The project later matured and attracted new volunteers who drove the project in its current object-oriented direction. Because the tool kit was being actively used at sequencing centers and as part of annotation groups, much of the focus was on building tools for sequence analysis.
The project was made open-source from the outset because the founders felt very strongly that they did not want to see a duplication of effort. By making the code available, other people could build upon their work as opposed to having to reinvent solutions. Because the participants were from both industry and academic backgrounds, the license agreement was chosen that allowed as many people as possible to use the tool kit while encouraging user contribution through a vibrant mailing list.
In 2001 the Open Bioinformatics Foundation ( O|B|F), a nonprofit foundation, was created as a legal entity to own the hardware the Bioperl and related projects had acquired and when assuming responsibility for events such as the Bioinformatics Open Source Conference ( BOSC).
Open Processes Encourage High Quality
Because Bioperl is open-sourced under the Perl Artistic license, which has limited restrictions on code redistribution, individuals building bioinformatics solutions in a variety of environments can easily obtain the tool kit. The licensing allows more people to contribute and encourages contribution from bug reports to new modules. Had this been a commercial product or even if there had been restrictive licensing of the tool kit, there would have been much less involvement and little to no input from individuals working in commercial enterprises.
This community involvement allows the project to be open to more than just the ideas of the major organizers. In fact, the leadership positions have often rotated to different sets of individuals in the 9 years of the project's existence. The Bioperl project does not receive any direct funding but relies on donations of volunteers' time and the generosity of their respective institutions for its ongoing development.
Most of the coordination of the project is done by e-mail. The developers typically meet at least once a year at the O|B|F-sponsored BOSC conference. Many of the developers run in the same scientific circles and bump into one another at various meetings, as well.
Contributions to Bioperl are typically made by submitting a bug report or patch to the mailing list or the project's bug-tracking server. A Bioperl developer will then make the necessary changes to the source code if the patch or new module is appropriate. Once someone has shown that they understand how to write Bioperl modules and conform to the coding standards, they are given an account to make changes to the code repository.
Jason Stajich is a Ph.D. candidate at the Duke University Department of Molecular Genetics and Microbiology and a National Science Foundation predoctoral fellow. Stajich is a core developer for Bioperl and participates in open-source software projects.