Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

May 15, 2014

Diving into HDFS

Filed under: Hadoop,HDFS — Patrick Durusau @ 2:22 pm

Diving into HDFS by Julia Evans.

From the post:

Yesterday I wanted to start learning about how HDFS (the Hadoop Distributed File System) works internally. I knew that

  • It’s distributed, so one file may be stored across many different machines
  • There’s a namenode, which keeps track of where all the files are stored
  • There are data nodes, which contain the actual file data

But I wasn’t quite sure how to get started! I knew how to navigate the filesystem from the command line (hadoop fs -ls /, and friends), but not how to figure out how it works internally.

Colin Marc pointed me to this great library called snakebite which is a Python HDFS client. In particular he pointed me to the part of the code that reads file contents from HDFS. We’re going to tear it apart a bit and see what exactly it does!

Be cautious reading Julia’s post!

Her enthusiasm can be infectious. 😉

Seriously, I take Julia’s posts as the way CS topics are supposed to be explored. While there is hard work, there is also the thrill of discovery. Not a bad approach to have.

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress