Learning Mahout : Collaborative Filtering by Sujit Pal.
From the post:
My Mahout in Action (MIA) book has been collecting dust for a while now, waiting for me to get around to learning about Mahout. Mahout is evolving quite rapidly, so the book is a bit dated now, but I decided to use it as a guide anyway as I work through the various modules in the currently GA) 0.7 distribution.
My objective is to learn about Mahout initially from a client perspective, ie, find out what ML modules (eg, clustering, logistic regression, etc) are available, and which algorithms are supported within each module, and how to use them from my own code. Although Mahout provides non-Hadoop implementations for almost all its features, I am primarily interested in the Hadoop implementations. Initially I just want to figure out how to use it (with custom code to tweak behavior). Later, I would like to understand how the algorithm is represented as a (possibly multi-stage) M/R job so I can build similar implementations.
I am going to write about my progress, mainly in order to populate my cheat sheet in the sky (ie, for future reference). Any code I write will be available in this GitHub (Scala) project.
The first module covered in the book is Collaborative Filtering. Essentially, it is a technique of predicting preferences given the preferences of others in the group. There are two main approaches – user based and item based. In case of user-based filtering, the objective is to look for users similar to the given user, then use the ratings from these similar users to predict a preference for the given user. In case of item-based recommendation, similarities between pairs of items are computed, then preferences predicted for the given user using a combination of the user’s current item preferences and the similarity matrix.
While you are working your way through this post, keep in mind: Collaborative filtering with GraphChi.
Question: What if you are an outlier?
Telephone marketing interviews with me get shortened by responses like: “X? Is that a TV show?”
How would you go about piercing the marketing veil to recommend your preferences?
Now that is a product to which even I might subscribe. (But don’t advertise on TV, I won’t see it.)