Understanding User Authentication and Authorization in Apache HBase by Matteo Bertozzi.
From the post:
With the default Apache HBase configuration, everyone is allowed to read from and write to all tables available in the system. For many enterprise setups, this kind of policy is unacceptable.
Administrators can set up firewalls that decide which machines are allowed to communicate with HBase. However, machines that can pass the firewall are still allowed to read from and write to all tables. This kind of mechanism is effective but insufficient because HBase still cannot differentiate between multiple users that use the same client machines, and there is still no granularity with regard to HBase table, column family, or column qualifier access.
In this post, we will discuss how Kerberos is used with Hadoop and HBase to provide User Authentication, and how HBase implements User Authorization to grant users permissions for particular actions on a specified set of data.
When you think about security, remember: Accumulo: Why The World Needs Another NoSQL Database. Accumulo was written to provide cell level security.
Nice idea but the burden of administering cell level authorizations is going to lead to sloppy security practices. Or granting higher level permissions, inadvisedly, to some users.
Not to mention the truck sized security hole in Accumulo for imported data changing access tokens.
You can get a lot of security mileage out of HBase and Kerberos, long before you get to cell level security permissions.
Another interesting aspect of this it to not expect the database to do everything. From that perspective, the current Orion System developed for the DIA provides a nice tradeoff here. Documents are stored in MongoDb in encrypted form and in Elastic Search. Elastic Search queries only return document IDs (not contents) based on user credentials and ACM markings associated with each document. Each query is wrapped with the appropriate security query based on user credentials. The end result is quick retrieval of a list of authorized document IDs to get from MongoDb.
Although this is not generic enough for business purposes, ElasticSearch plugins could be modularized to generate wrapper queries based on an alternate form of user credentials, access control markings, and roll-up algorithms. Another approach is for businesses to adopt aspects of CAPCO marking standards for their information and user credentials to control access.
For general information about the Orion program see
http://www.orionprogram.com/slicksheets/13-0192-Orion%20Fold%20out_v6.pdf
Comment by Robert of Fairfax — March 16, 2013 @ 3:38 pm
Robert, good point.
A component oriented approach can result in richer features for each component than a single monolithic solution.
Which makes me wonder: What if topic map capabilities were part of outgoing query or incoming results pipelines? Where other components handle storage, querying and other required features.
Comment by Patrick Durusau — March 17, 2013 @ 4:58 am