Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

January 19, 2012

All Your HBase Are Belong to Clojure

Filed under: Clojure,Hadoop,HBase — Patrick Durusau @ 7:41 pm

All Your HBase Are Belong to Clojure by

I’m sure you’ve heard a variation on this story before…

So I have this web crawler and it generates these super-detailed log files, which is great ‘cause then we know what it’s doing but it’s also kind of bad ‘cause when someone wants to know why the crawler did this thing but not that thing I have, like, literally gajigabytes of log files and I’m using grep and awk and, well, it’s not working out. Plus what we really want is a nice web application the client can use.

I’ve never really had a good solution for this. One time I crammed this data into a big Lucene index and slapped a web interface on it. One time I turned the data into JSON and pushed it into CouchDB and slapped a web interface on that. Neither solution left me with a great feeling although both worked okay at the time.

This time I already had a Hadoop cluster up and running, I didn’t have any experience with HBase but it looked interesting. After hunting around the internet, thought this might be the solution I had been seeking. Indeed, loading the data into HBase was fairly straightforward and HBase has been very responsive. I mean, very responsive now that I’ve structured my data in such a way that HBase can be responsive.

And that’s the thing: if you are loading literally gajigabytes of data into HBase you need to be pretty sure that it’s going to be able to answer your questions in a reasonable amount of time. Simply cramming it in there probably won’t work (indeed, that approach probably won’t work great for anything). I loaded and re-loaded a test set of twenty thousand rows until I had something that worked.

An excellent tutorial on Hadoop, HBase and Clojure!

First seen at myNoSQL but the URL is not longer working at in my Google Reader.

2 Comments

  1. […] HBase, Hadoop. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own […]

    Pingback by All Your HBase Are Belong to Clojure Another Word For It | Programmer Solution — January 22, 2012 @ 10:05 am

  2. […] HBase, Hadoop. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own […]

    Pingback by All Your HBase Are Belong to Clojure Another Word For It … | Programmer Solution — January 22, 2012 @ 10:35 am

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress