Hadoop: What it is and how people use it: my own summary by Bob DuCharme.
From the post:
The web offers plenty of introductions to what Hadoop is about. After reading up on it and trying it out a bit, I wanted to see if I could sum up what I see as the main points as concisely as possible. Corrections welcome.
Hadoop is an open source Apache project consisting of several modules. The key ones are the Hadoop Distributed File System (whose acronym is trademarked, apparently) and MapReduce. The HDFS lets you distribute storage across multiple systems and MapReduce lets you distribute processing across multiple systems by performing your “Map” logic on the distributed nodes and then the “Reduce” logic to gather up the results of the map processes on the master node that’s driving it all.
This ability to spread out storage and processing makes it easier to do large-scale processing without requiring large-scale hardware. You can spread the processing across whatever boxes you have lying around or across virtual machines on a cloud platform that you spin up for only as long as you need them. This ability to inexpensively scale up has made Hadoop one of the most popular technologies associated with the buzzphrase “Big Data.”
…
If you aren’t already familiar with Hadoop or if you are up to your elbows in Hadoop and need a literate summary to forward to others, I think this post does the trick.
Bob covers the major components of the Hadoop ecosystem without getting lost in the weeds.
Recommended reading.