ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data by Kai Wang, Mingyao Li, and Hakon Hakonarson. (Nucl. Acids Res. (2010) 38 (16): e164. doi: 10.1093/nar/gkq603)
Just in case you are unfamiliar with ANNOVAR, the software mentioned in: gSearch: a fast and flexible general search tool for whole-genome sequencing:
High-throughput sequencing platforms are generating massive amounts of genetic variation data for diverse genomes, but it remains a challenge to pinpoint a small subset of functionally important variants. To fill these unmet needs, we developed the ANNOVAR tool to annotate single nucleotide variants (SNVs) and insertions/deletions, such as examining their functional consequence on genes, inferring cytogenetic bands, reporting functional importance scores, finding variants in conserved regions, or identifying variants reported in the 1000 Genomes Project and dbSNP. ANNOVAR can utilize annotation databases from the UCSC Genome Browser or any annotation data set conforming to Generic Feature Format version 3 (GFF3). We also illustrate a ‘variants reduction’ protocol on 4.7 million SNVs and indels from a human genome, including two causal mutations for Miller syndrome, a rare recessive disease. Through a stepwise procedure, we excluded variants that are unlikely to be causal, and identified 20 candidate genes including the causal gene. Using a desktop computer, ANNOVAR requires ∼4 min to perform gene-based annotation and ∼15 min to perform variants reduction on 4.7 million variants, making it practical to handle hundreds of human genomes in a day. ANNOVAR is freely available at http://www.openbioinformatics.org/annovar/.
Approximately two years separates ANNOVAR from gSearch. Should give you an idea of the speed of development in bioinformatics. They haven’t labored over finding a syntax for everyone to use for more than a decade. I suspect there is a lesson in there somewhere.