Extending the GATK for custom variant comparisons using Clojure by Brad Chapman.
From the post:
The Genome Analysis Toolkit (GATK) is a full-featured library for dealing with next-generation sequencing data. The open-source Java code base, written by the Genome Sequencing and Analysis Group at the Broad Institute, exposes a Map/Reduce framework allowing developers to code custom tools taking advantage of support for: BAM Alignment files through Picard, BED and other interval file formats through Tribble, and variant data in VCF format.
Here I’ll show how to utilize the GATK API from Clojure, a functional, dynamic programming language that targets the Java Virtual Machine. We’ll:
- Write a GATK walker that plots variant quality scores using the Map/Reduce API.
- Create a custom annotation that adds a mean neighboring base quality metric using the GATK VariantAnnotator.
- Use the VariantContext API to parse and access variant information in a VCF file.
The Clojure variation library is freely available and is part of a larger project to provide variant assessment capabilities for the Archon Genomics XPRIZE competition.
Interesting data, commercial potential, cutting edge technology and subject identity issues galore. What more could you want?