From the post:
This is the story behind our PNAS paper, “Scaling Metagenome Assembly with Probabilistic de Bruijn Graphs” (released from embargo this past Monday).
Why did we write it? How did it get started? Well, rewind the tape 2 years and more…
There we were in May 2010, sitting on 500 million Illumina reads from shotgun DNA sequencing of an Iowa prairie soil sample. We wanted to reconstruct the microbial community contents and structure of the soil sample, but we couldn’t figure out how to do that from the data. We knew that, in theory, the data contained a number of partial microbial genomes, and we had a technique — de novo genome assembly — that could (again, in theory) reconstruct those partial genomes. But when we ran the software, it choked — 500 million reads was too big a data set for the software and computers we had. Plus, we were looking forward to the future, when we would get even more data; if the software was dying on us now, what would we do when we had 10, 100, or 1000 times as much data?
A perfect post to read over the weekend!
Not all research ends successfully, but when it does, it is a story that inspires.