From the post:
Filters are a powerful feature of HBase to delegate the selection of rows to the servers rather than moving rows to the Client. We present the filtering mechanism as an illustration of the general data locality principle and compare it to the traditional select-and-project data access pattern.
Dealing with massive amounts of data changes the way you think about data processing tasks. In a standard business application context, people use a Relational Database System (RDBMS) and consider this system as a service in charge of providing data to the client application. How this data is processed, manipulated, shown to the user, is considered to be the full responsability of the application. In other words, the role of the data server is restricted to what is does best: efficient, safe and consistent storage and access.
The post goes on to observe:
When you deal with BigData, the data center is your computer.
True, but that isn’t the lesson I would draw from HBase Filters.
The lesson I would draw is: it is only big data until you can find the relevant data.
I may have to sift several haystacks of data but at the end of the day I want the name, photo, location, target, time frame for any particular evil-doer. That “big data” was part of the process is a fact, not a goal. Yes?