Archive for the ‘HBase Coprocessor’ Category

[HBase] Coprocessor Introduction

Wednesday, February 1st, 2012

[HBase] Coprocessor Introduction by Trend Micro Hadoop Group: Mingjie Lai, Eugene Koontz and Andrew Purtell.

From the post:

HBase has very effective MapReduce integration for distributed computation over data stored within its tables, but in many cases – for example simple additive or aggregating operations like summing, counting, and the like – pushing the computation up to the server where it can operate on the data directly without communication overheads can give a dramatic performance improvement over HBase’s already good scanning performance.

Also, before 0.92, it was not possible to extend HBase with custom functionality except by extending the base classes. Due to Java’s lack of multiple inheritance this required extension plus base code to be refactored into a single class providing the full implementation, which quickly becomes brittle when considering multiple extensions. Who inherits from whom? Coprocessors allow a much more flexible mixin extension model.

In this article I will introduce the new Coprocessors feature of HBase, a framework for both flexible and generic extension, and of distributed computation directly within the HBase server processes. I will talk about what it is, how it works, and how to develop coprocessor extensions.

If you are using HBase, this looks like a must read article. It also covers how to write extensions to the coprocessor.

I first saw this at myNoSQL.