Recommendation with Apache Mahout in CDH3 by Josh Patterson.
From the introduction:
The amount of information we are exposed to on a daily basis is far outstripping our ability to consume it, leaving many of us overwhelmed by the amount of new content we have available. Ideally we’d like machines and algorithms to help us find the more interesting (for us individually) things so we more easily focus our attention on items of relevance.
Have you ever been recommended a friend on Facebook or an item you might be interested in on Amazon? If so then you’ve benefitted from the value of recommendation systems. Recommendation systems apply knowledge discovery techniques to the problem of making recommendations that are personalized for each user. Recommendation systems are one way we can use algorithms to help us sort through the masses of information to find the “good stuff” in a very personalized way.
Due to the explosion of web traffic and users the scale of recommendation poses new challenges for recommendation systems. These systems face the dual challenge of producing high quality recommendations while also calculating recommendations for millions of users. In recent years collaborative filtering (CF) has become popular as a way to effectively meet these challenges. CF techniques start off by analyzing the user-item matrix to identify relationships between different users or items and then use that information to produce recommendations for each user.
To use this post as an introduction to recommendation with Apache Mahout, is there anything you would change, subtract from or add to this post? If anything.
I am working on my answer to that question but am curious what you think?
I want to use this and similar material on a graduate library course more to demonstrate the principals than to turn any of the students into Hadoop hackers. (Although that would be a nice result as well.)