Full stack HA in Hadoop 1: HBase’s Resilience to Namenode Failover

Full stack HA in Hadoop 1: HBase’s Resilience to Namenode Failover by Devaraj Das.

From the post:

In this blog, I’ll cover how we tested Full Stack HA with NameNode HA in Hadooop 1 with Hadoop and HBase as components of the stack.

Yes, NameNode HA is finally available in the Hadoop 1 line. The test was done with Hadoop branch-1 and HBase-0.92.x on a cluster of roughly ten nodes. The aim was to try to keep a really busy HBase cluster up in the face of the cluster’s NameNode repeatedly going up and down. Note that, HBase would be functional during the time NameNode would be down. It’d only affect those operations that requires a trip to the NameNode (for example, rolling of the WAL, or compaction, or flush), and those would affect only the relevant end users (a user using the HBase get API may not be affected if that get didn’t require a new file open, for example).

A non-reliable cluster is just that, a non-reliable cluster. Not as bad as a backup that may or may not restore your data, but almost.

Regularly and routinely test any alleged HA capability along with backup restore capability. Document that testing.

As opposed to “testing” when either has to work or critical operations will fail or critical data will be lost.*

*Not Miller but résumé time.

Leave a Reply

You must be logged in to post a comment.