HDFS Metadata Directories Explained by Chris Nauroth.
From the post:
HDFS metadata represents the structure of HDFS directories and files in a tree. It also includes the various attributes of directories and files, such as ownership, permissions, quotas, and replication factor. In this blog post, I’ll describe how HDFS persists its metadata in Hadoop 2 by exploring the underlying local storage directories and files. All examples shown are from testing a build of the soon-to-be-released Apache Hadoop 2.6.0.
WARNING: Do not attempt to modify metadata directories or files. Unexpected modifications can cause HDFS downtime, or even permanent data loss. This information is provided for educational purposes only.
Persistence of HDFS metadata broadly breaks down into 2 categories of files:
- fsimage – An fsimage file contains the complete state of the file system at a point in time. Every file system modification is assigned a unique, monotonically increasing transaction ID. An fsimage file represents the file system state after all modifications up to a specific transaction ID.
- edits – An edits file is a log that lists each file system change (file creation, deletion or modification) that was made after the most recent fsimage.
Checkpointing is the process of merging the content of the most recent fsimage with all edits applied after that fsimage is merged in order to create a new fsimage. Checkpointing is triggered automatically by configuration policies or manually by HDFS administration commands.
…
When someone says: Do not attempt to modify metadata directories or files., it is just like waving a red flag in front of a bull!
Translate that to mean that hackers will know how to modify metadata directories or files and the average HDFS developer won’t.
I’m not saying to modify HDFS metadata directories or files on a production system for practice!
Practice somewhere safe, like in a sandbox but do practice.
Anything that can cause HDFS downtime or permanent data loss should be a matter of interest.