MongoDB Hadoop Connector Announced
From the post:
10gen is pleased to announce the availability of our first GA release of the MongoDB Hadoop Connector, version 1.0. This release was a long-term goal, and represents the culmination of over a year of work to bring our users a solid integration layer between their MongoDB deployments and Hadoop clusters for data processing. Available immediately, this connector supports many of the major Hadoop versions and distributions from 0.20.x and onwards.
The core feature of the Connector is to provide the ability to read MongoDB data into Hadoop MapReduce jobs, as well as writing the results of MapReduce jobs out to MongoDB. Users may choose to use MongoDB reads and writes together or separately, as best fits each use case. Our goal is to continue to build support for the components in the Hadoop ecosystem which our users find useful, based on feedback and requests.
For this initial release, we have also provided support for:
- writing to MongoDB from Pig (thanks to Russell Jurney for all of his patches and improvements to this feature)
- writing to MongoDB from the Flume distributed logging system
- using Python to MapReduce to and from MongoDB via Hadoop Streaming.
Hadoop Streaming was one of the toughest features for the 10gen team to build. To that end, look for a more technical post on the MongoDB blog in the next week or two detailing the issues we encountered and how to utilize this feature effectively.
Question: Is anyone working on a matrix of Hadoop connectors and their capabilities? A summary resource on Hadoop connectors might be of value.
[…] top: "+=100" }, "slow"); //.effect("bounce", { times: 5}, 300); }, 1000); }); tm.durusau.net (via @patrickDurusau) – Today, 12:37 […]
Pingback by MongoDB Hadoop Connector Announced « Another Word For It | #TrendW | Scoop.it — April 15, 2012 @ 11:37 pm