Bioclojure: a functional library for the manipulation of biological sequences

Bioclojure: a functional library for the manipulation of biological sequences by Jordan Plieskatt, Gabriel Rinaldi, Paul J. Brindley, Xinying Jia, Jeremy Potriquet, Jeffrey Bethony, and Jason Mulvenna.

Abstract:

Motivation: BioClojure is an open-source library for the manipulation of biological sequence data written in the language Clojure. BioClojure aims to provide a functional framework for the processing of biological sequence data that provides simple mechanisms for concurrency and lazy evaluation of large datasets.

Results: BioClojure provides parsers and accessors for a range of biological sequence formats, including UniProtXML, Genbank XML, FASTA and FASTQ. In addition, it provides wrappers for key analysis programs, including BLAST, SignalP, TMHMM and InterProScan, and parsers for analyzing their output. All interfaces leverage Clojure’s functional style and emphasize laziness and composability, so that BioClojure, and user-defined, functions can be chained into simple pipelines that are thread-safe and seamlessly integrate lazy evaluation.

Availability and implementation: BioClojure is distributed under the Lesser GPL, and the source code is freely available from GitHub (https://github.com/s312569/clj-biosequence).

Contact: jason.mulvenna@qimberghofer.edu.au or jason.mulvenna@qimr.edu.au

The introduction to this article is a great cut-n-paste “case for Clojure in bioinformatics.”

Functional programming is a programming style that treats computation as the evaluation of mathematical functions (Hudak, 1989). In its purest form, functional programming removes the need for variable assignment by using immutable data structures that eliminate the use of state and side effects (Backus, 1978). This ensures that functions will always return the same value given the same input. This greatly simplifies debugging and testing, as individual functions can be assessed in isolation regardless of a global state. Immutability also greatly simplifies concurrency and facilitates leveraging of multi-core computing facilities with little or no modifications to functionally written code. Accordingly, as a programming style, functional programming offers advantages for software development, including (i) brevity, (ii) simple handling of concurrency and (iii) seamless integration of lazy evaluation, simplifying the handling of large datasets. Clojure is a Lisp variant that encourages a functional style of programming by providing immutable data structures, functions as first-class objects and uses recursive iteration as opposed to state-based looping (Hickey, 2008). Clojure is built on the Java virtual machine (JVM), and thus, applications developed using BioClojure can be compiled into Java byte code and ran on any platform that runs the JVM. Moreover, libraries constructed using Clojure can be called in Java programs and, conversely, Java classes and methods can be called from Clojure programs, making available a large number of third-party Java libraries. BioClojure aims to leverage the tools provided by Clojure to provide a functional interface with biological sequence data and associated programs. BioClojure is similar in intent to other bioinformatics packages such as BioPerl (Stajich et al., 2002), BioPython (Cock et al., 2009), Bio++ (Dutheil et al., 2006) and BioJava (Prlić et al., 2012) but differs from these bioinformatics software libraries in its embrace of the functional style. With the decreasing cost of biological analyses, for example, next-generation sequencing, biologists are dealing with greater amounts of data, and BioClojure is an attempt to provide tools, emphasizing concurrency and lazy evaluation, for manipulating these data.

I like the introduction as a form of evangelism but use of Clojure and Bioclojure in bioinformatics to demonstrate its advantages is the better form of promotion.

Evangelism works best when results are untestable, not so well when results can be counted and measured.

Comments are closed.