Authorization and Authentication In Hadoop by Jon Natkins.
From the post:
One of the more confusing topics in Hadoop is how authorization and authentication work in the system. The first and most important thing to recognize is the subtle, yet extremely important, differentiation between authorization and authentication, so let’s define these terms first:
Authentication is the process of determining whether someone is who they claim to be.
Authorization is the function of specifying access rights to resources.
In simpler terms, authentication is a way of proving who I am, and authorization is a way of determining what I can do.
Let me see if I can summarize the authentication part: If you are responsible for the Hadoop cluster and unauthenticated users can access it, you need to have a backup job.
Hadoop doesn’t have authentication enabled by default but authentication for access to the cluster could be performed by some other mechanism. Such as access to the network where the cluster resides, etc.
There are any number of ways to do authentication but to lack authentication to a network asset is a recipe for being fired upon its discovery.
Authorization regulates access and usage of cluster assets.
Here’s the test for authentication and authorization on your mission critical Hadoop cluster. While sitting in front of your cluster admin’s desk, ask for a copy of the authentication and authorization policies and settings for your cluster. If they can’t send it to a printer, you need another cluster admin. It is really that simple.