How to Build a Recommendation Engine by John F. McGowan.
From the post:
This article shows how to build a simple recommendation engine using GNU Octave, a high-level interpreted language, primarily intended for numerical computations, that is mostly compatible with MATLAB. A recommendation engine is a program that recommends items such as books and movies for customers, typically of a web site such as Amazon or Netflix, to purchase. Recommendation engines frequently use statistical and mathematical methods to estimate what items a customer would like to buy or would benefit from purchasing.
From a purely business point of view, one would like to maximize the profit from a customer, discounted for time (a dollar today is worth more than a dollar next year), over the duration that the customer is a customer of the business. In a long term relationship with a customer, this probably means that the customer needs to be happy with most purchases and most recommendations.
Recommendation engines are “hot” right now. There are many attempts to apply advanced statistics and mathematics to predict what customers will buy, what purchases will make customers happy and buy again, and what purchases deliver the most value to customers. Data scientists are trying to apply a range of methods with fancy technical names such as principal component analysis (PCA), neural networks, and support vector machines (SVM) — amongst others — to predicting successful purchases and personalizing recommendations for individual customers based on their stated preferences, purchasing history, demographics and other factors.
This article presents a simple recommendation engine using Pearson’s product moment correlation coefficient, also known as the linear correlation coefficient. The engine uses the correlation coefficient to identify customers with similar purchasing patterns, and presumably tastes, and recommends items purchased by one customer to the other similar customer who has not purchased those items.
Probably not the recommendation engine you will use for commercial deployment.
But, it will give you a good start on understanding the principles of recommendation engines.
My interest in recommendations isn’t so much to identify the subjects of recommendation, which are topics in their own rights, as in probing the basis for subject identification by multiple users.
That is there is some identification that underlies a choice of some book or movie over another. It may not be possible to identify the components of that identification, but we do have aftermath of that identification.
Rather than collapsing dimensions, thinking we should expand the dimensions around choices to see if any patterns emerge.
I first saw this at DZone.