Atepassar Recommendations: Recommending friends with MapReduce and Python by Marcel Caraciolo.
From the post:
In this post I will present one of the tecnhiques used at Atépassar, a brazilian social network that help students around Brazil in order to pass the exams for a civil job, our recommender system.
(graphic omitted)
I will describe some of the data models that we use and discuss our approach to algorithmic innovation that combines offline machine learning with online testing. For this task we use distributed computing since we deal with over with 140 thousand users. MapReduce is a powerful technique and we use it by writting in python code with the framework MrJob. I recommend you to read further about it at my last post here.
One of our recommender techniques is the simple ‘people you might know‘ recommender algorithm. Indeed, there are several components behind the algorithm since at Atépassar, users can follow other people as also be followed by other people. In this post I will talk about the basic idea of the algorithm which can be derivated for those other components. The idea of the algorithm is that if person A and person B do know each other but they have a lot of mutual friends, then the system should recommend that they connect with each other.
Is there a presumption in social recommendation programs that there are no duplicate people in the network? Using different names? If two people have exactly the same friends, is there some chance they could be the same person?
How many “same” friends would you require? 20? 30? 50? Some other number?
Curious because determining personal identity and identity of the people behind two or more entries, may be a matter of pattern matching.
BTW, this is a interesting looking blog. You may want to browse older entries or even subscribe.