Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

March 6, 2012

Extending the GATK for custom variant comparisons using Clojure

Filed under: Bioinformatics,Biomedical,Clojure,MapReduce — Patrick Durusau @ 8:09 pm

Extending the GATK for custom variant comparisons using Clojure by Brad Chapman.

From the post:

The Genome Analysis Toolkit (GATK) is a full-featured library for dealing with next-generation sequencing data. The open-source Java code base, written by the Genome Sequencing and Analysis Group at the Broad Institute, exposes a Map/Reduce framework allowing developers to code custom tools taking advantage of support for: BAM Alignment files through Picard, BED and other interval file formats through Tribble, and variant data in VCF format.

Here I’ll show how to utilize the GATK API from Clojure, a functional, dynamic programming language that targets the Java Virtual Machine. We’ll:

  • Write a GATK walker that plots variant quality scores using the Map/Reduce API.
  • Create a custom annotation that adds a mean neighboring base quality metric using the GATK VariantAnnotator.
  • Use the VariantContext API to parse and access variant information in a VCF file.

The Clojure variation library is freely available and is part of a larger project to provide variant assessment capabilities for the Archon Genomics XPRIZE competition.

Interesting data, commercial potential, cutting edge technology and subject identity issues galore. What more could you want?

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress