Algorithmic Music Discovery at Spotify by Chris Johnson.
From the description:
In this presentation I introduce various Machine Learning methods that we utilize for music recommendations and discovery at Spotify. Specifically, I focus on Implicit Matrix Factorization for Collaborative Filtering, how to implement a small scale version using python, numpy, and scipy, as well as how to scale up to 20 Million users and 24 Million songs using Hadoop and Spark.
Among a number of interesting points, Chris points out differences between movie and music data.
One difference is that songs are consumed over and over again. Another is that users rate movies but “vote” by their streaming behavior on songs.*
While leads to Chris’ main point, implicit matrix factorization. Code. The source code page points to: Collaborative Filtering for Implicit Feedback Datasets by Yifan Hu, Yehuda Koren, and Chris Volinsky.
Scaling that process is represented in blocks for Hadoop and Spark.
* I suspect that “behavior” is more reliable than “ratings” from the same user. Reasoning ratings are more likely to be subject to social influences. I don’t have any research at my fingertips on that issue. Do you?