Using Cassandra to Build a Naive Bayes Classifier of Users Based Upon Behavior by John Berryman.
From the post:
In our last post, we found out how simple it is to use Cassandra to estimate ad conversion. It’s easy, because effectively all you have to do is accumulate counts – and Cassandra is quite good at counting. As we demonstrated in that post, Cassandra can be used as a giant, distributed, redundant, “infinitely” scalable counting framework. During this post will take the online ad company example just a bit further by creating a Cassandra-backed Naive Bayes Classifier. Again, we see that the “secret sauce” is simply keeping track of the appropriate counts.
In the previous post, we helped equip your online ad company with the ability to track ad conversion rates. But competition is steep and we’ll need to do a little better than ad conversion rates if your company is to stay on top. Recently, suspicions have arisen that ads are often being shown to unlikely customers. A quick look at the logs confirms this concern. For instance, there was a case of one internet user that clicked almost every single ad that he was shown – so long as it related to the camping gear. Several times, he went on to make purchases: a tent, a lantern, and a sleeping bag. But despite this users obvious interest in outdoor sporting goods, your logs indicated that fully 90% of the ads he was shown were for women’s apparel. Of these ads, this user clicked none of them.
Let’s attack this problem by creating a classifier. Fortunately for us, your company specializes in two main genres, fashion, and outdoors sporting goods. If we can determine which type of user we’re dealing with, then we can improve our conversion rates considerably by simply showing users the appropriate ads.
So long as you remember the unlikely assumption of feature independence of Naive Bayes, you should be ok.
That is whatever features you are measuring are independent of each other.
Has been “successfully” used in a number of contexts, but the descriptions I have read don’t specify what they meant by “successful.” 😉