Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

October 29, 2013

Useful Unix/Linux One-Liners for Bioinformatics

Filed under: Bioinformatics,Linux OS,Text Mining,Uncategorized — Patrick Durusau @ 6:36 pm

Useful Unix/Linux One-Liners for Bioinformatics by Stephen Turner.

From the post:

Much of the work that bioinformaticians do is munging and wrangling around massive amounts of text. While there are some “standardized” file formats (FASTQ, SAM, VCF, etc.) and some tools for manipulating them (fastx toolkit, samtools, vcftools, etc.), there are still times where knowing a little bit of Unix/Linux is extremely helpful, namely awk, sed, cut, grep, GNU parallel, and others.

This is by no means an exhaustive catalog, but I’ve put together a short list of examples using various Unix/Linux utilities for text manipulation, from the very basic (e.g., sum a column) to the very advanced (munge a FASTQ file and print the total number of reads, total number unique reads, percentage of unique reads, most abundant sequence, and its frequency). Most of these examples (with the exception of the SeqTK examples) use built-in utilities installed on nearly every Linux system. These examples are a combination of tactics I used everyday and examples culled from other sources listed at the top of the page.

What one liners do you have laying about?

For what data sets?

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress