Bioinformatics Tools in Haskell by Udo Stenzel.
From the post:
This is a collection of miscellaneous stuff that deals mostly with high-throughput sequencing data. I took some of my throw-away scripts that had developed a life of their own, separated out a library, and cleaned up the rest. Everything is licensed under the GPL and naturally comes without any warranty.
Most of the stuff here is written in Haskell. The natural way to run these programs is to install the Haskell Platform, which may be as easy as running ‘
apt-get install haskell-platform
‘, e.g. on Debian Testing aka “Squeeze”. Instead, you can install\ the Glasgow Haskell Compiler and Cabal individually. After that, download, unpack and ‘cabal install
‘ Biohazard first, then install whatever else you need.If you don’t want to become a Haskell programmer (you really should), you can still download the binary packages (for Linux on ix86_64) and hope that they work. You’ll probably need to install Gnu MP (‘
apt-get install libgmp-dev
‘ might do it). If the binaries don’t work for you, I don’t care; use the source instead.
Good for bioinformatics and I suspect for learning Haskell in high-throughput situations.
Speculation: How will processing change when there is only “high-throughput data streams?”
That is there isn’t any “going back” to find legacy data, you just wait for it to reappear in the stream?
Or if there were streams of “basic” data that doesn’t change much along with other data streams that are “new” or rapidly changing data.
If that sounds wasteful of bandwidth, imagine if bandwidth were to increase at the same rate as local storage? So that your incoming data connection is 1 TB or higher at your home computer.
Would you really need local storage at all?