BLAST – Basic Local Alignment Search Tool (Wikipedia)
From Wikipedia:
In bioinformatics, Basic Local Alignment Search Tool, or BLAST, is an algorithm for comparing primary biological sequence information, such as the amino-acid sequences of different proteins or the nucleotides of DNA sequences. A BLAST search enables a researcher to compare a query sequence with a library or database of sequences, and identify library sequences that resemble the query sequence above a certain threshold. Different types of BLASTs are available according to the query sequences. For example, following the discovery of a previously unknown gene in the mouse, a scientist will typically perform a BLAST search of the human genome to see if humans carry a similar gene; BLAST will identify sequences in the human genome that resemble the mouse gene based on similarity of sequence. The BLAST program was designed by Stephen Altschul, Warren Gish, Webb Miller, Eugene Myers, and David J. Lipman at the NIH and was published in the Journal of Molecular Biology in 1990.[1]
I found the uses of BLAST of particular interest:
Uses of BLAST
BLAST can be used for several purposes. These include identifying species, locating domains, establishing phylogeny, DNA mapping, and comparison.
- Identifying species
- With the use of BLAST, you can possibly correctly identify a species and/or find homologous species. This can be useful, for example, when you are working with a DNA sequence from an unknown species.
- Locating domains
- When working with a protein sequence you can input it into BLAST, to locate known domains within the sequence of interest.
- Establishing phylogeny
- Using the results received through BLAST you can create a phylogenetic tree using the BLAST web-page. Phylogenies based on BLAST alone are less reliable than other purpose-built computational phylogenetic methods, so should only be relied upon for “first pass” phylogenetic analyses.
- DNA mapping
- When working with a known species, and looking to sequence a gene at an unknown location, BLAST can compare the chromosomal position of the sequence of interest, to relevant sequences in the database(s).
- Comparison
- When working with genes, BLAST can locate common genes in two related species, and can be used to map annotations from one organism to another.
Not just for the many uses of BLAST in genomics, but what of using similar techniques with other data sets?
Are they not composed of “sequences?”
Actually, many string comparison techniques have made it from genetics into record linkage. One example being the Smith-Waterman string distance.
Comment by larsga@garshol.priv.no — November 23, 2012 @ 2:26 pm
Good point. But recall that BLAST was developed because of performance issues with Smith-Waterman relative to the size of the data sets.
I’m curious about less than perfect matching that can scale as the size of data sets increase.
I suspect there is a lot of “cross-pollination” that happens in a area as fundamental as string comparison.
Comment by Patrick Durusau — November 23, 2012 @ 7:05 pm