Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

November 9, 2013

Migrating to MapReduce 2 on YARN (For Users)

Filed under: Hadoop YARN,MapReduce 2.0 — Patrick Durusau @ 8:10 pm

Migrating to MapReduce 2 on YARN (For Users) by Sandy Ryza.

From the post:

In Apache Hadoop 2, YARN and MapReduce 2 (MR2) are long-needed upgrades for scheduling, resource management, and execution in Hadoop. At their core, the improvements separate cluster resource management capabilities from MapReduce-specific logic. They enable Hadoop to share resources dynamically between MapReduce and other parallel processing frameworks, such as Cloudera Impala; allow more sensible and finer-grained resource configuration for better cluster utilization; and permit Hadoop to scale to accommodate more and larger jobs.

In this post, users of CDH (Cloudera’s distribution of Hadoop and related projects) who program MapReduce jobs will get a guide to the architectural and user-facing differences between MapReduce 1 (MR1) and MR2. (MR2 is the default processing framework in CDH 5, although MR1 will continue to be supported.) Operators/administrators can read a similar post designed for them here.

From further within the post:

MR2 supports both the old (“mapred”) and new (“mapreduce”) MapReduce APIs used for MR1, with a few caveats. The difference between the old and new APIs, which concerns user-facing changes, should not be confused with the difference between MR1 and MR2, which concerns changes to the underlying framework. CDH 4 and CDH 5 support the new and old MapReduce APIs as well as both MR1 and MR2. (Now, go back and read this paragraph again, because the naming is often a source of confusion.) (Emphasis added.)

And under Job Configuration:

As in MR1, job configuration options can be specified on the command line, in Java code, or in the mapred-site.xml on the client machine in the same way they previously were. Most job configuration options, with rare exceptions, that were available in MR1 work in MR2 as well. For consistency and clarity, many options have been given new names. The older names are deprecated, but will still work for the time being. The exceptions are mapred.child.ulimit and all options relating to JVM reuse, which are no longer supported. (Emphasis added.)

That’s all very reassuring.

Are your MapReduce engineers using the old names (deprecated) or the new names or some combination of both?

As software evolves, changing of names cannot be avoided and no doubt Cloudera has tried to avoid gratuitous name changes.

But at the bottom line, isn’t it your responsibility to track internal use of names? For consistently and maintenance?

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress