An update on Apache Hadoop 1.0

An update on Apache Hadoop 1.0 from Cloudera by Charles Zedlewski.

From the post:

Some users & customers have asked about the most recent release of Apache Hadoop, v1.0: what’s in it, what it followed and what it preceded. To explain this we should start with some basics of how Apache projects release software:

By and large, in Apache projects new features are developed on a main codeline known as “trunk.” Occasionally very large features are developed on their own branches with the expectation they’ll later merge into trunk. While new features usually land in trunk before they reach a release, there is not much expectation of quality or stability. Periodically, candidate releases are branched from trunk. Once a candidate release is branched it usually stops getting new features. Bugs are fixed and after a vote, a release is declared for that particular branch. Any member of the community can create a branch for a release and name it whatever they like.

About as clear an explanation of the Apache process and current state of Hadoop releases as is possible, given the facts Charles had to work with.

Still, for the average Cloudera release user I think something along the lines of:

There has been some confusion over the jump from 0.2* versions of Hadoop to a release of Hadoop 1.0 at Apache.

You have not missed various 0.3* and later releases!

Like political candidates, Apache releases can call themselves anything they like. The Hadoop project leaders decided to call a recent release Hadoop 1.0. Given the confusion this caused, maybe we will see more orderly naming in the future. Maybe not.

If you have CDH3, then you have all the features of the recent “Hadoop 1.0″ and have had them for almost one year. (If you don’t have CDH3, you may wish to consider upgrading.)

(then conclude with)

[T]he CDH engineering team is comprised of more than 20 engineers that are committers and PMC members of the various Apache projects who can shape the innovation of the extended community into a single coherent system. It is why we believe demonstrated leadership in open source contribution is the only way to harness the open innovation of the Apache Hadoop ecosystem.

would have been better.

People are looking for a simple explanation with some reassurance that all is well.

2 Responses to “An update on Apache Hadoop 1.0”

  1. […] is filed under Hadoop. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own […]

  2. […] Patrick Durusau suggests an even shorter explanation: […]