Archive for the ‘Associative Classification Mining’ Category

Machine Learning and Data Mining – Association Analysis with Python

Thursday, January 17th, 2013

Machine Learning and Data Mining – Association Analysis with Python by Marcel Caraciolo.

From the post:

Recently I’ve been working with recommender systems and association analysis. This last one, specially, is one of the most used machine learning algorithms to extract from large datasets hidden relationships.

The famous example related to the study of association analysis is the history of the baby diapers and beers. This history reports that a certain grocery store in the Midwest of the United States increased their beers sells by putting them near where the stippers were placed. In fact, what happened is that the association rules pointed out that men bought diapers and beers on Thursdays. So the store could have profited by placing those products together, which would increase the sales.

Association analysis is the task of finding interesting relationships in large data sets. There hidden relationships are then expressed as a collection of association rules and frequent item sets. Frequent item sets are simply a collection of items that frequently occur together. And association rules suggest a strong relationship that exists between two items.

When I think of associations in a topic map, I assume I am at least starting with the roles and the players of those roles.

As this post demonstrates, that may be overly optimistic on my part.

What if I discover an association but not its type or the roles in it? And yet I still want to preserve the discovery for later use?

An incomplete association as it were.


Fast rule-based bioactivity prediction using associative classification mining

Sunday, November 25th, 2012

Fast rule-based bioactivity prediction using associative classification mining by Pulan Yu and David J Wild. (Journal of Cheminformatics 2012, 4:29 )

Who moved my acronym? continues: ACM = Association for Computing Machinery or associative classification mining.


Relating chemical features to bioactivities is critical in molecular design and is used extensively in lead discovery and optimization process. A variety of techniques from statistics, data mining and machine learning have been applied to this process. In this study, we utilize a collection of methods, called associative classification mining (ACM), which are popular in the data mining community, but so far have not been applied widely in cheminformatics. More specifically, the classification based on predictive association rules (CPAR), classification based on multiple association rules (CMAR) and classification based on association rules (CBA) are employed on three datasets using various descriptor sets. Experimental evaluations on anti-tuberculosis (antiTB), mutagenicity and hERG (the human Ether-a-go-go-Related Gene) blocker datasets show that these three methods are computationally scalable and appropriate for high speed mining. Additionally, they provide comparable accuracy and efficiency to the commonly used Bayesian and support vector machines (SVM) method, and produce highly interpretable models.

An interesting lead on investigation of associations in large data sets. Pass on those meeting a threshold on for further evaluation?