Why Extended Attributes are Coming to HDFS by Charles Lamb.
From the post:
Extended attributes in HDFS will facilitate at-rest encryption for Project Rhino, but they have many other uses, too.
Many mainstream Linux filesystems implement extended attributes, which let you associate metadata with a file or directory beyond common “fixed” attributes like filesize, permissions, modification dates, and so on. Extended attributes are key/value pairs in which the values are optional; generally, the key and value sizes are limited to some implementation-specific limit. A filesystem that implements extended attributes also provides system calls and shell commands to get, list, set, and remove attributes (and values) to/from a file or directory.
Recently, my Intel colleague Yi Liu led the implementation of extended attributes for HDFS (HDFS-2006). This work is largely motivated by Cloudera and Intel contributions to bringing at-rest encryption to Apache Hadoop (HDFS-6134; also see this post) under Project Rhino – extended attributes will be the mechanism for associating encryption key metadata with files and encryption zones — but it’s easy to imagine lots of other places where they could be useful.
For instance, you might want to store a document’s author and subject in sometime like
user.author=cwl
anduser.subject=HDFS
. You could store a file checksum in an attribute calleduser.checksum
. Even just comments about a particular file or directory can be saved in an extended attribute.In this post, you’ll learn some of the details of this feature from an HDFS user’s point of view.
…
Extended attributes sound like an interesting place to tuck away additional information about a file.
Such as the legend to be used to interpret it?