New in CDH 5.3: Transparent Encryption in HDFS by Charles Lamb, Yi Liu & Andrew Wang
From the post:
Apache Hadoop 2.6 adds support for transparent encryption to HDFS. Once configured, data read from and written to specified HDFS directories will be transparently encrypted and decrypted, without requiring any changes to user application code. This encryption is also end-to-end, meaning that data can only be encrypted and decrypted by the client. HDFS itself never handles unencrypted data or data encryption keys. All these characteristics improve security, and HDFS encryption can be an important part of an organization-wide data protection story.
Cloudera’s HDFS and Cloudera Navigator Key Trustee (formerly Gazzang zTrustee) engineering teams did this work under HDFS-6134 in collaboration with engineers at Intel as an extension of earlier Project Rhino work. In this post, we’ll explain how it works, and how to use it.
Excellent news! Especially for data centers who are responsible for the data of others.
The authors do mention the problem of rogue users, that is on the client side:
Finally, since each file is encrypted with a unique DEK and each EZ can have a different key, the potential damage from a single rogue user is limited. A rogue user can only access EDEKs and ciphertext of files for which they have HDFS permissions, and can only decrypt EDEKs for which they have KMS permissions. Their ability to access plaintext is limited to the intersection of the two. In a secure setup, both sets of permissions will be heavily restricted.
Just so you know, it won’t be a security problem with Hadoop 2.6 if Sony is hacked while running on a Hadoop 2.6 at a data center. Anyone who copies the master access codes from sticky notes will be able to do a lot of damage. North Korea, will be the whipping boy for major future cyberhacks. That’s policy, not facts talking.
For users who do understand what secure environments should look like, this a great advance.