Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

March 2, 2015

Operationalizing a Hadoop Eco-System

Filed under: Hadoop,Hive — Patrick Durusau @ 4:04 pm

(Part 1: Installing & Configuring a 3-node Cluster) by Louis Frolio.

From the post:

The objective of DataTechBlog is to bring the many facets of data, data tools, and the theory of data to those curious about data science and big data. The relationship between these disciplines and data can be complex. However, if careful consideration is given to a tutorial, it is a practical expectation that the layman can be brought online quickly. With that said, I am extremely excited to bring this tutorial on the Hadoop Eco-system. Hadoop & MapReduce (at a high level) are not complicated ideas. Basically, you take a large volume of data and spread it across many servers (HDFS). Once at rest, the data can be acted upon by the many CPU’s in the cluster (MapReduce). What makes this so cool is that the traditional approach to processing data (bring data to cpu) is flipped. With MapReduce, CPU is brought to the data. This “divide-and-conquer” approach makes Hadoop and MapReduce indispensable when processing massive volumes of data. In part 1 of this multi-part series, I am going to demonstrate how to install, configure and run a 3-node Hadoop cluster. Finally, at the end I will run a simple MapReduce job to perform a unique word count of Shakespeare’s Hamlet. Future installments of this series will include topics such as: 1. Creating an advanced word count with MapReduce, 2. Installing and running Hive, 3. Installing and running Pig, 4. Using Sqoop to extract and import structured data into HDFS. The goal is to illuminate all the popular and useful tools that support Hadoop.

Operationalizing a Hadoop Eco-System (Part 2: Customizing Map Reduce)

Operationalizing a Hadoop Eco-System (Part 3: Installing and using Hive)

Be forewarned that Louis suggests hosting three Linux VMs on a fairly robust machine. He worked on a Windows 7 x64 machine with 1 TB of storage and 24G of RAM. (How much of that was used by Windows and Office he doesn’t say. 😉 )

The last post in this series was in April 2014 so you may have to look elsewhere for tutorials on Pig and Sqoop.

Enjoy!

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress