Real World Hadoop – Implementing a Left Outer Join in Map Reduce by Matthew Rathbone.
From the post:
This article is part of my guide to map reduce frameworks, in which I implement a solution to a real-world problem in each of the most popular hadoop frameworks.
If you’re impatient, you can find the code for the map-reduce implementation on my github, otherwise, read on!
The Problem
Let me quickly restate the problem from my original article.I have two datasets:
- User information (id, email, language, location)
- Transaction information (transaction-id, product-id, user-id, purchase-amount, item-description)
Given these datasets, I want to find the number of unique locations in which each product has been sold.
Not as easy a problem as it appears. But I suspect a common one in practice.