Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

October 8, 2012

Are You Confused? (About MR2 and YARN?) Help is on the way!

Filed under: Hadoop,Hadoop YARN,MapReduce 2.0 — Patrick Durusau @ 6:51 pm

MR2 and YARN Briefly Explained by Justin Kestelyn.

Justin writes:

With CDH4 onward, the Apache Hadoop component introduced two new terms for Hadoop users to wonder about: MR2 and YARN. Unfortunately, these terms are mixed up so much that many people are confused about them. Do they mean the same thing, or not?

Not but see Justin’s post for the details. (He also points to a longer post with more details.)

October 4, 2012

YARN Meetup at Hortonworks on Friday, Oct 12

Filed under: Hadoop,Hadoop YARN,Hortonworks — Patrick Durusau @ 4:35 pm

YARN Meetup at Hortonworks on Friday, Oct 12 by Russell Jurney.

From the post:

Hortonworks is hosting an Apache YARN Meetup on Friday, Oct 12, to solicit feedback on the YARN APIs. We’ve talked about YARN before in a four-part series on YARN, parts one, two, three and four.

YARN, or “Apache Hadoop NextGen MapReduce,” has come a long way this year. It is now a full-fledged sub-project of Apache Hadoop and has already been deployed on a massive 2,000 node cluster at Yahoo. Many projects, both open-src and otherwise, are porting to work in YARN such as Storm, S4 and many of them are in fairly advanced stages. We also have several individuals implementing one-off or ad-hoc application on YARN.

This meetup is a good time for YARN developers to catch up and talk more about YARN, it’s current status and medium-term and long-term roadmap.

OK, it’s probably too late to get cheap tickets but if you are in New York on the 12th of October, take advantage of the opportunity!

And please blog about the meeting, with a note to yours truly! I will post a link to your posting.

September 12, 2012

Apache Hadoop YARN – NodeManager

Filed under: Hadoop YARN,Hortonworks — Patrick Durusau @ 10:06 am

Apache Hadoop YARN – NodeManager by Vinod Kumar Vavilapalli

From the post:

In the previous post, we briefly covered the internals of Apache Hadoop YARN’s ResourceManager. In this post, which is the fourth in the multi-part YARN blog series, we are going to dig deeper into the NodeManager internals and some of the key-features that NodeManager exposes. Part one, two and three are available.

Introduction

The NodeManager (NM) is YARN’s per-node agent, and takes care of the individual compute nodes in a Hadoop cluster. This includes keeping up-to date with the ResourceManager (RM), overseeing containers’ life-cycle management; monitoring resource usage (memory, CPU) of individual containers, tracking node-health, log’s management and auxiliary services which may be exploited by different YARN applications.

Administration isn’t high on the “exciting” list, although without good administration, things can get very “exciting.”

NodeManager gives you the monitoring tools to help avoid the latter form of excitement.

September 9, 2012

Apache Hadoop YARN – Concepts and Applications

Filed under: Hadoop,Hadoop YARN — Patrick Durusau @ 4:29 pm

Apache Hadoop YARN – Concepts and Applications by Jim Walker.

From the post:

In our previous post we provided an overview and an outline of the motivation behind Apache Hadoop YARN, the latest Apache Hadoop subproject. In this post we cover the key YARN concepts and walk through how diverse user applications work within this new system.

I thought I had missed a post in this series and I had! 😉

Enjoy!

September 5, 2012

New ‘The Future of Apache Hadoop’ Season!

Filed under: Hadoop,Hadoop YARN,Hortonworks,Zookeeper — Patrick Durusau @ 3:37 pm

OK, the real title is: Four New Installments in ‘The Future of Apache Hadoop’ Webinar Series

From the post:

During the ‘Future of Apache Hadoop’ webinar series, Hortonworks founders and core committers will discuss the future of Hadoop and related projects including Apache Pig, Apache Ambari, Apache Zookeeper and Apache Hadoop YARN.

Apache Hadoop has rapidly evolved to become the leading platform for managing, processing and analyzing big data. Consequently there is a thirst for knowledge on the future direction for Hadoop related projects. The Hortonworks webinar series will feature core committers of the Apache projects discussing the essential components required in a Hadoop Platform, current advances in Apache Hadoop, relevant use-cases and best practices on how to get started with the open source platform. Each webinar will include a live Q&A with the individuals at the center of the Apache Hadoop movement.

Coming to a computer near you:

  • Pig Out on Hadoop (Alan Gates): Wednesday, September 12 at 10:00 a.m. PT / 1:00 p.m. ET
  • Deployment and Management of Hadoop Clusters with Ambari (Matt Foley): Wednesday, September 26 at 10:00 a.m. PT / 1:00 p.m. ET
  • Scaling Apache Zookeeper for the Next Generation of Hadoop Applications (Mahadev Konar): Wednesday, October 17 at 10:00 a.m. PT / 1:00 p.m. ET
  • YARN: The Future of Data Processing with Apache Hadoop ( Arun C. Murthy): Wednesday, October 31 at 10:00 a.m. PT / 1:00 p.m. ET

Registration is open so get it on your calendar!

August 31, 2012

Apache Hadoop YARN – ResourceManager

Filed under: Hadoop YARN,Hortonworks — Patrick Durusau @ 3:45 pm

Apache Hadoop YARN – ResourceManager by Arun Murthy

From the post:

This is the third post in the multi-part series to cover important aspects of the newly formed Apache Hadoop YARN sub-project. In our previous posts (part one, part two), we provided the background and an overview of Hadoop YARN, and then covered the key YARN concepts and walked you through how diverse user applications work within this new system.

In this post, we are going to delve deeper into the heart of the system – the ResourceManager.

In case your data processing needs run towards the big/large end of the spectrum.

August 9, 2012

Apache Hadoop YARN – Background and an Overview

Filed under: Hadoop,Hadoop YARN,MapReduce — Patrick Durusau @ 3:39 pm

Apache Hadoop YARN – Background and an Overview by Arun Murth.

From the post:

MapReduce – The Paradigm

Essentially, the MapReduce model consists of a first, embarrassingly parallel, map phase where input data is split into discreet chunks to be processed. It is followed by the second and final reduce phase where the output of the map phase is aggregated to produce the desired result. The simple, and fairly restricted, nature of the programming model lends itself to very efficient and extremely large-scale implementations across thousands of cheap, commodity nodes.

Apache Hadoop MapReduce is the most popular open-source implementation of the MapReduce model.

In particular, when MapReduce is paired with a distributed file-system such as Apache Hadoop HDFS, which can provide very high aggregate I/O bandwidth across a large cluster, the economics of the system are extremely compelling – a key factor in the popularity of Hadoop.

One of the keys to this is the lack of data motion i.e. move compute to data and do not move data to the compute node via the network. Specifically, the MapReduce tasks can be scheduled on the same physical nodes on which data is resident in HDFS, which exposes the underlying storage layout across the cluster. This significantly reduces the network I/O patterns and allows for majority of the I/O on the local disk or within the same rack – a core advantage.

An introduction to the architecture of Apache Hadoop YARN that starts with its roots in MapReduce.

August 3, 2012

Introducing Apache Hadoop YARN

Filed under: Hadoop,Hadoop YARN,HDFS,MapReduce — Patrick Durusau @ 3:03 pm

Introducing Apache Hadoop YARN by Arun Murthy.

From the post:

I’m thrilled to announce that the Apache Hadoop community has decided to promote the next-generation Hadoop data-processing framework, i.e. YARN, to be a sub-project of Apache Hadoop in the ASF!

Apache Hadoop YARN joins Hadoop Common (core libraries), Hadoop HDFS (storage) and Hadoop MapReduce (the MapReduce implementation) as the sub-projects of the Apache Hadoop which, itself, is a Top Level Project in the Apache Software Foundation. Until this milestone, YARN was a part of the Hadoop MapReduce project and now is poised to stand up on it’s own as a sub-project of Hadoop.

In a nutshell, Hadoop YARN is an attempt to take Apache Hadoop beyond MapReduce for data-processing.

As folks are aware, Hadoop HDFS is the data storage layer for Hadoop and MapReduce was the data-processing layer. However, the MapReduce algorithm, by itself, isn’t sufficient for the very wide variety of use-cases we see Hadoop being employed to solve. With YARN, Hadoop now has a generic resource-management and distributed application framework, where by, one can implement multiple data processing applications customized for the task at hand. Hadoop MapReduce is now one such application for YARN and I see several others given my vantage point – in future you will see MPI, graph-processing, simple services etc.; all co-existing with MapReduce applications in a Hadoop YARN cluster.

Considering the explosive growth of Hadoop, what new data processing applications do you see emerging first in YARN?

« Newer Posts

Powered by WordPress