Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

December 10, 2011

Exploring Hadoop OutputFormat

Filed under: Hadoop — Patrick Durusau @ 8:06 pm

Exploring Hadoop OutputFormat by Jim.Blomo.

From the post:

Hadoop is often used as a part in a larger ecosystem of data processing. Hadoop’s sweet spot, batch processing large amounts of data, can best be put to use by integrating it with other systems. At a high level, Hadoop ingests input files, streams the contents through custom transformations (the Map-Reduce steps), and writes output files back to disk. Last month InfoQ showed how to gain finer control over the first step, ingestion of input files via the InputFormat class. In this article, we’ll discuss how to customize the final step, writing the output files. OutputFormats let you easily interoperate with other systems by writing the result of a MapReduce job in formats readable by other applications. To demonstrate the usefulness of OutputFormats, we’ll discuss two examples: how to split up the result of a job into different directories, and how to write files for a service providing fast key-value lookups.

One more set of tools to add to your Hadoop toolbox!

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress