HBase I/O – HFile by Matteo Bertozzi.
From the post:
Introduction
Apache HBase is the Hadoop open-source, distributed, versioned storage manager well suited for random, realtime read/write access.
Wait wait? random, realtime read/write access?
How is that possible? Is not Hadoop just a sequential read/write, batch processing system?
Yes, we’re talking about the same thing, and in the next few paragraphs, I’m going to explain to you how HBase achieves the random I/O, how it stores data and the evolution of the HBase’s HFile format.
Hadoop I/O file formats
Hadoop comes with a SequenceFile[1] file format that you can use to append your key/value pairs but due to the hdfs append-only capability, the file format cannot allow modification or removal of an inserted value. The only operation allowed is append, and if you want to lookup a specified key, you’ve to read through the file until you find your key.
As you can see, you’re forced to follow the sequential read/write pattern… but how is it possible to build a random, low-latency read/write access system like HBase on top of this?
To help you solve this problem Hadoop has another file format, called MapFile[1], an extension of the SequenceFile. The MapFile, in reality, is a directory that contains two SequenceFiles: the data file “/data” and the index file “/index”. The MapFile allows you to append sorted key/value pairs and every N keys (where N is a configurable interval) it stores the key and the offset in the index. This allows for quite a fast lookup, since instead of scanning all the records you scan the index which has less entries. Once you’ve found your block, you can then jump into the real data file.
A couple of important lessons:
First, file formats evolve. They shouldn’t be entombed by programming code, no matter how clever your code may be. That is what “versions” are for.
Second, the rapid evolution of the Hadoop ecosystem makes boundary observations strictly temporary. Wait a week or so, they will change!