Case study: million songs dataset by Danny Bickson.
From the post:
A couple of days ago I wrote about the million songs dataset. Our man in London, Clive Cox from Rummble Labs, suggested we should implement rankings based on item similarity.
Thanks to Clive suggestion, we have now an implementation of Fabio Aiolli’s cost function as explained in the paper: A Preliminary Study for a Recommender System for the Million Songs Dataset, which is the winning method in this contest.
Following are detailed instructions on how to utilize GraphChi CF toolkit on the million songs dataset data, for computing user ratings out of item similarities.
Just in case you need some data for practice with your GraphChi installation. 😉
Seriously, nice way to gain familiarity with the data set.
What value you extract from it is up to you.