Archive for the ‘Classifier’ Category

Is it a vehicle? A helicopter? No, it’s a rifle! Messing with Machine Learning

Wednesday, December 20th, 2017

Partial Information Attacks on Real-world AI

From the post:

We’ve developed a query-efficient approach for finding adversarial examples for black-box machine learning classifiers. We can even produce adversarial examples in the partial information black-box setting, where the attacker only gets access to “scores” for a small number of likely classes, as is the case with commercial services such as Google Cloud Vision (GCV).

The post is a quick read (est. 2 minutes) with references but you really need to see:

Query-efficient Black-box Adversarial Examples by Andrew Ilyas, Logan Engstrom, Anish Athalye, Jessy Lin.


Current neural network-based image classifiers are susceptible to adversarial examples, even in the black-box setting, where the attacker is limited to query access without access to gradients. Previous methods — substitute networks and coordinate-based finite-difference methods — are either unreliable or query-inefficient, making these methods impractical for certain problems.

We introduce a new method for reliably generating adversarial examples under more restricted, practical black-box threat models. First, we apply natural evolution strategies to perform black-box attacks using two to three orders of magnitude fewer queries than previous methods. Second, we introduce a new algorithm to perform targeted adversarial attacks in the partial-information setting, where the attacker only has access to a limited number of target classes. Using these techniques, we successfully perform the first targeted adversarial attack against a commercially deployed machine learning system, the Google Cloud Vision API, in the partial information setting.

The paper contains this example:

How does it go? Seeing is believing!

Defeating image classifiers will be an exploding market for jewel merchants, bankers, diplomats, and others with reasons to avoid being captured by modern image classification systems.

Build your own neural network classifier in R

Wednesday, February 10th, 2016

Build your own neural network classifier in R by Jun Ma.

From the post:

Image classification is one important field in Computer Vision, not only because so many applications are associated with it, but also a lot of Computer Vision problems can be effectively reduced to image classification. The state of art tool in image classification is Convolutional Neural Network (CNN). In this article, I am going to write a simple Neural Network with 2 layers (fully connected). First, I will train it to classify a set of 4-class 2D data and visualize the decision boundary. Second, I am going to train my NN with the famous MNIST data (you can download it here: and see its performance. The first part is inspired by CS 231n course offered by Stanford:, which is taught in Python.

One suggestion, based on some unrelated reading, don’t copy-n-paste the code.

Key in the code so you will get accustomed to your typical typing mistakes, which are no doubt different from mine!

Plus you will develop muscle memory in your fingers and code will either “look right” or not.


PS: For R, Jun’s blog looks like one you need to start following!

A review of learning vector quantization classifiers

Wednesday, September 23rd, 2015

A review of learning vector quantization classifiers by David Nova, Pablo A. Estevez.


In this work we present a review of the state of the art of Learning Vector Quantization (LVQ) classifiers. A taxonomy is proposed which integrates the most relevant LVQ approaches to date. The main concepts associated with modern LVQ approaches are defined. A comparison is made among eleven LVQ classifiers using one real-world and two artificial datasets.

From the introduction:

Learning Vector Quantization (LVQ) is a family of algorithms for statistical pattern classification, which aims at learning prototypes (codebook vectors) representing class regions. The class regions are defined by hyperplanes between prototypes, yielding Voronoi partitions. In the late 80’s Teuvo Kohonen introduced the algorithm LVQ1 [36, 38], and over the years produced several variants. Since their inception LVQ algorithms have been researched by a small but active community. A search on the ISI Web of Science in November, 2013, found 665 journal articles with the keywords “Learning Vector Quantization” or “LVQ” in their titles or abstracts. This paper is a review of the progress made in the field during the last 25 years.

Heavy sledding but if you want to review the development of a classification algorithm with a manageable history, this is a likely place to start.


Perceptual feature-based song genre classification using RANSAC [Published?]

Tuesday, June 30th, 2015

Perceptual feature-based song genre classification using RANSAC by Arijit Ghosal; Rudrasis Chakraborty; Bibhas Chandra Dhara; Sanjoy Kumar Saha. International Journal of Computational Intelligence Studies (IJCISTUDIES), Vol. 4, No. 1, 2015.


In the context of a content-based music retrieval system or archiving digital audio data, genre-based classification of song may serve as a fundamental step. In the earlier attempts, researchers have described the song content by a combination of different types of features. Such features include various frequency and time domain descriptors depicting the signal aspects. Perceptual aspects also have been combined along with. A listener perceives a song mostly in terms of its tempo (rhythm), periodicity, pitch and their variation and based on those recognises the genre of the song. Motivated by this observation, in this work, instead of dealing with wide range of features we have focused only on the perceptual aspect like melody and rhythm. In order to do so audio content is described based on pitch, tempo, amplitude variation pattern and periodicity. Dimensionality of descriptor vector is reduced and finally, random sample and consensus (RANSAC) is used as the classifier. Experimental result indicates the effectiveness of the proposed scheme.

A new approach to classification of music, but that’s all I can say since the content is behind a pay-wall.

One way to increase the accessibility of texts would be for tenure committees to not consider publications as “published” until they are freely available for the author’s webpage.

That one change could encourage authors to press for the right to post their own materials and to follow through with posting them as soon as possible.

Feel free to forward this post to members of your local tenure committee.

Flock: Hybrid Crowd-Machine Learning Classifiers

Monday, March 16th, 2015

Flock: Hybrid Crowd-Machine Learning Classifiers by Justin Cheng and Michael S. Bernstein.


We present hybrid crowd-machine learning classifiers: classification models that start with a written description of a learning goal, use the crowd to suggest predictive features and label data, and then weigh these features using machine learning to produce models that are accurate and use human-understandable features. These hybrid classifiers enable fast prototyping of machine learning models that can improve on both algorithm performance and human judgment, and accomplish tasks where automated feature extraction is not yet feasible. Flock, an interactive machine learning platform, instantiates this approach. To generate informative features, Flock asks the crowd to compare paired examples, an approach inspired by analogical encoding. The crowd’s efforts can be focused on specific subsets of the input space where machine-extracted features are not predictive, or instead used to partition the input space and improve algorithm performance in subregions of the space. An evaluation on six prediction tasks, ranging from detecting deception to differentiating impressionist artists, demonstrated that aggregating crowd features improves upon both asking the crowd for a direct prediction and off-the-shelf machine learning features by over 10%. Further, hybrid systems that use both crowd-nominated and machine-extracted features can outperform those that use either in isolation.

Let’s see, suggest predictive features (subject identifiers in the non-topic map technical sense) and label data (identify instances of a subject), sounds a lot easier that some of the tedium I have seen for authoring a topic map.

I particularly like the “inducing” of features versus relying on a crowd to suggest identifying features. I suspect that would work well in a topic map authoring context, sans the machine learning aspects.

This paper is being presented this week, CSCW 2015, so you aren’t too far behind. 😉

How would you structure an inducement mechanism for authoring a topic map?

Do we Need Hundreds of Classi fiers to Solve Real World Classi fication Problems?

Thursday, December 11th, 2014

Do we Need Hundreds of Classi fiers to Solve Real World Classification Problems? by Manuel Fernández-Delgado, Eva Cernadas, Senén Barro, and Dinani Amorim. (Journal of Machine Learning Research 15 (2014) 3133-3181)


We evaluate 179 classifiers arising from 17 families (discriminant analysis, Bayesian, neural networks, support vector machines, decision trees, rule-based classifiers, boosting, bagging, stacking, random forests and other ensembles, generalized linear models, nearest-neighbors, partial least squares and principal component regression, logistic and multinomial regression, multiple adaptive regression splines and other methods), implemented in Weka, R (with and without the caret package), C and Matlab, including all the relevant classifiers available today. We use 121 data sets, which represent the whole UCI data base (excluding the large-scale problems) and other own real problems, in order to achieve significant conclusions about the classifier behavior, not dependent on the data set collection. The classifiers most likely to be the bests are the random forest (RF) versions, the best of which (implemented in R and accessed via caret) achieves 94.1% of the maximum accuracy overcoming 90% in the 84.3% of the data sets. However, the difference is not statistically significant with the second best, the SVM with Gaussian kernel implemented in C using LibSVM, which achieves 92.3% of the maximum accuracy. A few models are clearly better than the remaining ones: random forest, SVM with Gaussian and polynomial kernels, extreme learning machine with Gaussian kernel, C5.0 and avNNet (a committee of multi-layer perceptrons implemented in R with the caret package). The random forest is clearly the best family of classifiers (3 out of 5 bests classifiers are RF), followed by SVM (4 classifiers in the top-10), neural networks and boosting ensembles (5 and 3 members in the top-20, respectively).

Keywords: classifi cation, UCI data base, random forest, support vector machine, neural networks, decision trees, ensembles, rule-based classi fiers, discriminant analysis, Bayesian classifi ers, generalized linear models, partial least squares and principal component regression, multiple adaptive regression splines, nearest-neighbors, logistic and multinomial regression

Deeply impressive work but I can hear in the distance the girding of loins and sharpening of tools of scholarly disagreement. 😉

If you are looking for a very comprehensive reference of current classifiers, this is the paper for you.

For the practicing data scientist I think the lesson is to learn a small number of the better classifiers and to not fret overmuch about the lesser ones. If a major breakthrough in classification techniques does happen, it will be in the major tools with great fanfare.

I first saw this in a tweet by Jason Baldridge.

Pattern recognition toolbox

Monday, December 30th, 2013

Pattern recognition toolbox by Thomas W. Rauber.

From the webpage:

TOOLDIAG is a collection of methods for statistical pattern recognition. The main area of application is classification. The application area is limited to multidimensional continuous features, without any missing values. No symbolic features (attributes) are allowed. The program in implemented in the ‘C’ programming language and was tested in several computing environments. The user interface is simple, command-line oriented, but the methods behind it are efficient and fast. You can customize your own methods on the application programming level with relatively little effort. If you wish a presentation of the theory behind the program at your university, feel free to contact me.

Command line classification. A higher learning curve that some but expect greater flexibility as well.

I thought the requirement of “no missing values” was curious.

If you have a data set with some legitimately missing values, how are you going to replace them in a neutral way?

Feature Selection with Scikit-Learn

Sunday, May 26th, 2013

Feature Selection with Scikit Learn by Sujit Pal.

From the post:

I am currently doing the Web Intelligence and Big Data course from Coursera, and one of the assignments was to predict a person’s ethnicity from a set of about 200,000 genetic markers (provided as boolean values). As you can see, a simple classification problem.

One of the optimization suggestions for the exercise was to prune the featureset. Prior to this, I had only a vague notion that one could do this by running correlations of each feature against the outcome, and choosing the most highly correlated ones. This seemed like a good opportunity to learn a bit about this, so I did some reading and digging within Scikit-Learn to find if they had something to do this (they did). I also decided to investigate how the accuracy of a classifier varies with the feature size. This post is a result of this effort.

The IR Book has a sub-chapter on Feature Selection. Three main approaches to Feature Selection are covered – Mutual Information based, Chi-square based and Frequency based. Scikit-Learn provides several methods to select features based on Chi-Squared and ANOVA F-values for classification. I learned about this from Matt Spitz’s passing reference to Chi-squared feature selection in Scikit-Learn in his Slugger ML talk at Pycon USA 2012.

In the code below, I compute the accuracies with various feature sizes for 9 different classifiers, using both the Chi-squared measure and the ANOVA F measures.

Sujit uses Scikit-Learn to investigate the accuracy of classifiers.

Label propagation in GraphChi

Monday, February 11th, 2013

Label propagation in GraphChi by Danny Bickson.

From the post:

A few days ago I got a request from Jidong, from the Chinese Renren company to implement label propagation in GraphChi. The algorithm is very simple described here: Zhu, Xiaojin, and Zoubin Ghahramani. Learning from labeled and unlabeled data with label propagation. Technical Report CMU-CALD-02-107, Carnegie Mellon University, 2002.

The basic idea is that we start with a group of users that we have some information about the categories they are interested in. Following the weights in the social network, we propagate the label probabilities from the user seed node (the ones we have label information about) into the general social network population. After several iterations, the algorithm converges and the output is labels for the unknown nodes.

I assume there is more unlabeled data for topic maps than labeled data.

Depending upon your requirements, this could prove to be a useful technique for completing those unlabeled nodes.

Graph Based Classification Methods Using Inaccurate External Classifier Information

Wednesday, January 30th, 2013

Graph Based Classification Methods Using Inaccurate External Classifier Information by Sundararajan Sellamanickam and Sathiya Keerthi Selvaraj.


In this paper we consider the problem of collectively classifying entities where relational information is available across the entities. In practice inaccurate class distribution for each entity is often available from another (external) classifier. For example this distribution could come from a classifier built using content features or a simple dictionary. Given the relational and inaccurate external classifier information, we consider two graph based settings in which the problem of collective classification can be solved. In the first setting the class distribution is used to fix labels to a subset of nodes and the labels for the remaining nodes are obtained like in a transductive setting. In the other setting the class distributions of all nodes are used to define the fitting function part of a graph regularized objective function. We define a generalized objective function that handles both the settings. Methods like harmonic Gaussian field and local-global consistency (LGC) reported in the literature can be seen as special cases. We extend the LGC and weighted vote relational neighbor classification (WvRN) methods to support usage of external classifier information. We also propose an efficient least squares regularization (LSR) based method and relate it to information regularization methods. All the methods are evaluated on several benchmark and real world datasets. Considering together speed, robustness and accuracy, experimental results indicate that the LSR and WvRN-extension methods perform better than other methods.

Doesn’t read like a page-turner does it? 😉

An example from the paper will help illustrate why this is an important paper:

In this paper we consider a related relational learning problem where, instead of a subset of labeled nodes, we have inaccurate external label/class distribution information for each node. This problem arises in many web applications. Consider, for example, the problem of identifying pages about Public works, Court, Health, Community development, Library etc. within the web site of a particular city. The link and directory relations contain useful signals for solving such a classifi cation problem. Note that this relational structure will be diff erent for di fferent city web sites. If we are only interested in a small number of cities then we can a fford to label a number of pages in each site and then apply transductive learning using the labeled nodes. But, if we want to do the classifi cation on hundreds of thousands of city sites, labeling on all sites is expensive and we need to take a diff erent approach. One possibility is to use a selected set of content dictionary features together with the labeling of a small random sample of pages from a number of sites to learn an inaccurate probabilistic classifi er, e.g., logistic regression. Now, for any one city web site, the output of this initial classifi er can be used to generate class distributions for the pages in the site, which can then be used together with the relational information in the site to get accurate classifi cation.

In topic map parlance, we would say identity was being established by the associations in which a topic participates but that is a matter of terminology and not substantive difference.

Class-imbalanced classifiers for high-dimensional data

Tuesday, January 22nd, 2013

Class-imbalanced classifiers for high-dimensional data by Wei-Jiun Lin and James J. Chen. (Brief Bioinform (2013) 14 (1): 13-26. doi: 10.1093/bib/bbs006)


A class-imbalanced classifier is a decision rule to predict the class membership of new samples from an available data set where the class sizes differ considerably. When the class sizes are very different, most standard classification algorithms may favor the larger (majority) class resulting in poor accuracy in the minority class prediction. A class-imbalanced classifier typically modifies a standard classifier by a correction strategy or by incorporating a new strategy in the training phase to account for differential class sizes. This article reviews and evaluates some most important methods for class prediction of high-dimensional imbalanced data. The evaluation addresses the fundamental issues of the class-imbalanced classification problem: imbalance ratio, small disjuncts and overlap complexity, lack of data and feature selection. Four class-imbalanced classifiers are considered. The four classifiers include three standard classification algorithms each coupled with an ensemble correction strategy and one support vector machines (SVM)-based correction classifier. The three algorithms are (i) diagonal linear discriminant analysis (DLDA), (ii) random forests (RFs) and (ii) SVMs. The SVM-based correction classifier is SVM threshold adjustment (SVM-THR). A Monte–Carlo simulation and five genomic data sets were used to illustrate the analysis and address the issues. The SVM-ensemble classifier appears to perform the best when the class imbalance is not too severe. The SVM-THR performs well if the imbalance is severe and predictors are highly correlated. The DLDA with a feature selection can perform well without using the ensemble correction.

At least the “big data” folks are right on one score: We are going to need help sorting out all the present and future information.

Not that we will ever attempt to sort it all out, as was reported in: The Untapped Big Data Gap (2012) [Merry Christmas Topic Maps!], only 23% of “big data” is going to be valuable if we do analyze it.

And your enterprise’s part of that 23% is even smaller.

Enough that your users will need help dealing with it, but not nearly the deluge that is being predicted.

How do you compare two text classfiers?

Friday, May 4th, 2012

How do you compare two text classfiers?

Tony Russell-Rose writes:

I need to compare two text classifiers – one human, one machine. They are assigning multiple tags from an ontology. We have an initial corpus of ~700 records tagged by both classifiers. The goal is to measure the ‘value added’ by the human. However, we don’t yet have any ground truth data (i.e. agreed annotations).

Any ideas on how best to approach this problem in a commercial environment (i.e. quickly, simply, with minimum fuss), or indeed what’s possible?

I thought of measuring the absolute delta between the two profiles (regardless of polarity) to give a ceiling on the value added, and/or comparing the profile of tags added by each human coder against the centroid to give a crude measure of inter-coder agreement (and hence difficulty of the task). But neither really measures the ‘value added’ that I’m looking for, so I’m sure there must better solutions.

Suggestions, anyone? Or is this as far as we can go without ground truth data?

Some useful comments have been made. Do you have others?

PS: I wrote at Tony’s blog in a comment:


The ‘value added’ by human taggers concept is unclear. The tagging in both cases is the result of human adding of semantics. Once through the rules for the machine tagger and once via the “human” taggers.

Can you say a bit more about what you see as a separate ‘value added’ by the human taggers?

What do you think? Is Tony’s question clear enough?

Classifier Technology and the Illusion of Progress

Friday, April 13th, 2012

Classifier Technology and the Illusion of Progress by David J. Hand.

Was pointed to in Simply Statistics for 8 April 2012:


A great many tools have been developed for supervised classification, ranging from early methods such as linear discriminant analysis through to modern developments such as neural networks and support vector machines. A large number of comparative studies have been conducted in attempts to establish the relative superiority of these methods. This paper argues that these comparisons often fail to take into account important aspects of real problems, so that the apparent superiority of more sophisticated methods may be something of an illusion. In particular, simple methods typically yield performance almost as good as more sophisticated methods, to the extent that the difference in performance may be swamped by other sources of uncertainty that generally are not considered in the classical supervised classification paradigm.

The original pointer didn’t mention there were four published comments and a formal rejoinder:

Comment: Classifier Technology and the Illusion of Progress by Jerome H. Friedman.

Comment: Classifier Technology and the Illusion of Progress–Credit Scoring by Ross W. Gayler.

Elaboration on Two Points Raised in “Classifier Technology and the Illusion of Progress” by Robert C. Holte.

Comment: Classifier Technology and the Illusion of Progress by Robert A. Stine.

Rejoinder: Classifier Technology and the Illusion of Progress by David J. Hand.

Enjoyable reading, one and all!

Adobe Releases Malware Classifier Tool

Wednesday, April 4th, 2012

Adobe Releases Malware Classifier Tool by Dennis Fisher.

From the post:

Adobe has published a free tool that can help administrators and security researchers classify suspicious files as malicious or benign, using specific machine-learning algorithms. The tool is a command-line utility that Adobe officials hope will make binary classification a little easier.

Adobe researcher Karthik Raman developed the new Malware Classifier tool to help with the company’s internal needs and then decided that it might be useful for external users, as well.

” To make life easier, I wrote a Python tool for quick malware triage for our team. I’ve since decided to make this tool, called “Adobe Malware Classifier,” available to other first responders (malware analysts, IT admins and security researchers of any stripe) as an open-source tool, since you might find it equally helpful,” Raman wrote in a blog post.

“Malware Classifier uses machine learning algorithms to classify Win32 binaries – EXEs and DLLs – into three classes: 0 for “clean,” 1 for “malicious,” or “UNKNOWN.” The tool extracts seven key features from a binary, feeds them to one or all of the four classifiers, and presents its classification results.”

Adobe Malware Classifier (Sourceforge)

Old hat that malware scanners have been using machine learning but new that you can now see it from the inside.

Lessons to be learned about machine learning algorithms for malware and other uses with software.

Kudos to Adobe!

Will the Circle Be Unbroken? Interactive Annotation!

Wednesday, February 29th, 2012

I have to agree with Bob Carpenter, the title is a bit much:

Closing the Loop: Fast, Interactive Semi-Supervised Annotation with Queries on Features and Instances

From the post:

Whew, that was a long title. Luckily, the paper’s worth it:

Settles, Burr. 2011. Closing the Loop: Fast, Interactive Semi-Supervised Annotation With Queries on Features and Instances. EMNLP.

It’s a paper that shows you how to use active learning to build reasonably high-performance classifier with only minutes of user effort. Very cool and right up our alley here at LingPipe.

Both the paper and Bob’s review merit close reading.

Combining Heterogeneous Classifiers for Relational Databases (Of Relational Prisons and such)

Sunday, January 22nd, 2012

Combining Heterogeneous Classifiers for Relational Databases by Geetha Manjunatha, M Narasimha Murty and Dinkar Sitaram.


Most enterprise data is distributed in multiple relational databases with expert-designed schema. Using traditional single-table machine learning techniques over such data not only incur a computational penalty for converting to a ‘flat’ form (mega-join), even the human-specified semantic information present in the relations is lost. In this paper, we present a practical, two-phase hierarchical meta-classification algorithm for relational databases with a semantic divide and conquer approach. We propose a recursive, prediction aggregation technique over heterogeneous classifiers applied on individual database tables. The proposed algorithm was evaluated on three diverse datasets, namely TPCH, PKDD and UCI benchmarks and showed considerable reduction in classification time without any loss of prediction accuracy.

When I read:

So, a typical enterprise dataset resides in such expert-designed multiple relational database tables. On the other hand, as known, most traditional classi cation algorithms still assume that the input dataset is available in a single table – a flat representation of data attributes. So, for applying these state-of-art single-table data mining techniques to enterprise data, one needs to convert the distributed relational data into a flat form.

a couple of things dropped into place.

First, the problem being described, the production of a flat form for analysis reminds me of the problem of record linkage in the late 1950’s (predating relational databases). There records were regularized to enable very similar analysis.

Second, as the authors state in a paragraph or so, conversion to such a format is not possible in most cases. Interesting that the choice of relational database table design has the impact of limiting the type of analysis that can be performed on the data.

Therefore, knowledge mining over real enterprise data using machine learning techniques is very valuable for what is called an intelligent enterprise. However, application of state-of-art pattern recognition techniques in the mainstream BI has not yet taken o [Gartner report] due to lack of in-memory analytics among others. The key hurdle to make this possible is the incompatibility between the input data formats used by most machine learning techniques and the formats used by real enterprises.

If freeing data from its relational prison is a key aspect to empowering business intelligence (BI), what would you suggest as a solution?

Domain Adaptation with Hierarchical Logistic Regression

Thursday, October 6th, 2011

Domain Adaptation with Hierarchical Logistic Regression

Bob Carpenter continues his series on domain adaptation:

Last post, I explained how to build hierarchical naive Bayes models for domain adaptation. That post covered the basic problem setup and motivation for hierarchical models.

Hierarchical Logistic Regression

Today, we’ll look at the so-called (in NLP) “discriminative” version of the domain adaptation problem. Specifically, using logistic regression. For simplicity, we’ll stick to the binary case, though this could all be generalized to K-way classifiers.

Logistic regression is more flexible than naive Bayes in allowing other features (aka predictors) to be brought in along with the words themselves. We’ll start with just the words, so the basic setup look more like naive Bayes.

Domain Adaptation with Hierarchical Naive Bayes Classifiers

Sunday, September 25th, 2011

Domain Adaptation with Hierarchical Naive Bayes Classifiers by Bob Carpenter.

From the post:

This will be the first of two posts exploring hierarchical and multilevel classifiers. In this post, I’ll describe a hierarchical generalization of naive Bayes (what the NLP world calls a “generative” model). The next post will explore hierarchical logistic regression (called a “discriminative” or “log linear” or “max ent” model in NLP land).

Very entertaining and useful if you use NLP at all in your pre-topic map phase.

Top Scoring Pairs for Feature Selection in Machine Learning and Applications to Cancer Outcome Prediction

Friday, September 23rd, 2011

Top Scoring Pairs for Feature Selection in Machine Learning and Applications to Cancer Outcome Prediction by Ping Shi, Surajit Ray, Qifu Zhu and Mark A Kon.

BMC Bioinformatics 2011, 12:375 doi:10.1186/1471-2105-12-375 Published: 23 September 2011



The widely used k top scoring pair (k-TSP) algorithm is a simple yet powerful parameter-free classifier. It owes its success in many cancer microarray datasets to an effective feature selection algorithm that is based on relative expression ordering of gene pairs. However, its general robustness does not extend to some difficult datasets, such as those involving cancer outcome prediction, which may be due to the relatively simple voting scheme used by the classifier. We believe that the performance can be enhanced by separating its effective feature selection component and combining it with a powerful classifier such as the support vector machine (SVM). More generally the top scoring pairs generated by the k-TSP ranking algorithm can be used as a dimensionally reduced subspace for other machine learning classifiers.


We developed an approach integrating the k-TSP ranking algorithm (TSP) with other machine learning methods, allowing combination of the computationally efficient, multivariate feature ranking of k-TSP with multivariate classifiers such as SVM. We evaluated this hybrid scheme (k-TSP+SVM) in a range of simulated datasets with known data structures. As compared with other feature selection methods, such as a univariate method similar to Fisher’s discriminant criterion (Fisher), or a recursive feature elimination embedded in SVM (RFE), TSP is increasingly more effective than the other two methods as the informative genes become progressively more correlated, which is demonstrated both in terms of the classification performance and the ability to recover true informative genes. We also applied this hybrid scheme to four cancer prognosis datasets, in which k-TSP+SVM outperforms k-TSP classifier in all datasets, and achieves either comparable or superior performance to that using SVM alone. In concurrence with what is observed in simulation, TSP appears to be a better feature selector than Fisher and RFE in some of the cancer datasets


The k-TSP ranking algorithm can be used as a computationally efficient, multivariate filter method for feature selection in machine learning. SVM in combination with k-TSP ranking algorithm outperforms k-TSP and SVM alone in simulated datasets and in some cancer prognosis datasets. Simulation studies suggest that as a feature selector, it is better tuned to certain data characteristics, i.e. correlations among informative genes, which is potentially interesting as an alternative feature ranking method in pathway analysis.

Knowing the tools that are already in use in bioinformatics will help you design topic map applications of interest to those in that field. And this is a very nice combination of methods to study on its own.

Naive Bayes Classifiers – Python

Thursday, September 15th, 2011

Naive Bayes Classifiers – Python

From the post:

In naive Bayes classifiers, every feature gets a say in determining which label should be assigned to a given input value. To choose a label for an input value, the naive Bayes classifier begins by calculating the prior probability of each label, which is determined by checking the frequency of each label in the training set. The contribution from each feature is then combined with this prior probability, to arrive at a likelihood estimate for each label. The label whose likelihood estimate is the highest is then assigned to the input value.

Just one recent post from Python Language Processing. There are a number of others, some of which I will call out in future posts.

Predicate dispatching: A unified theory of dispatch

Friday, August 19th, 2011

Predicate dispatching: A unified theory of dispatch

The term predicate dispatching was new to me and so I checked at Stackoverflow and found: What is predicate dispatch?

This paper was one of answers, which is accompanied with slides, implementation and manual.


Predicate dispatching generalizes previous method dispatch mechanisms by permitting arbitrary predicates to control method applicability and by using logical implication between predicates as the overriding relationship. The method selected to handle a message send can depend not just on the classes of the arguments, as in ordinary object-oriented dispatch, but also on the classes of subcomponents, on an argument’s state, and on relationships between objects. This simple mechanism subsumes and extends object-oriented single and multiple dispatch, ML-style pattern matching, predicate classes, and classifiers, which can all be regarded as syntactic sugar for predicate dispatching. This paper introduces predicate dispatching, gives motivating examples, and presents its static and dynamic semantics. An implementation of predicate dispatching is available.

Thought it might be interesting weekend reading.

Sentiment Analysis: Machines Are Like Us

Friday, August 5th, 2011

Sentiment Analysis: Machines Are Like Us

Interesting post but in particular for:

We are very aware of the importance of industry-specific language here at Brandwatch and we do our best to offer language analysis that specialises in industries as much as possible.

We constantly refine our language systems by adding newly trained classifiers (a classifier is the particular system used to detect and analyse the language of a query’s matches – which classifier should be used is determined upon query creation).

We have over 500 classifiers for different industries across the 17 languages we cover.

Did you catch that? Over 500 classifiers for different industries.

In other words, we don’t need a single classifier that does all the heavy lifting on entity recognition for building topic maps. We could, for example, train a classifier for use with all the journals in a field or sub-field. For astronomy, for example, we don’t have to disambiguate all the various uses of “Venus” but can concentrate on the one most likely to be found in a sub-set of astronomy literature.

By using specialized classifiers, perhaps we can reduce the target for more generalized classifiers to a manageable size.


Saturday, July 2nd, 2011


From the webpage:

uClassify is a free web service where you can easily create your own text classifiers. You can also directly use classifiers that have already been shared by the community.


  • Language detection
  • Web page categorization
  • Written text gender and age recognition
  • Mood
  • Spam filter
  • Sentiment
  • Automatic e-mail support
  • See below for some examples

So what do you want to classify on? Only your imagination is the limit!

As of 1 July 2011, thirty-seven public classifiers are waiting on you and your imagination.

The emphasis is on tagging documents.

How useful is tagging documents when a search results in > 100 documents? Would your answer be the same or different if the search results were < 20 documents? What if the search results were > 500 documents?

I first saw this at textifter blog in the post A Classifier for the Masses.

FPGA Based Face Detection System Using Haar Classifiers

Monday, June 27th, 2011

FPGA Based Face Detection System Using Haar Classifiers

From the abstract:

This paper presents a hardware architecture for face detection based system on AdaBoost algorithm using Haar features. We describe the hardware design techniques including image scaling, integral image generation, pipelined processing as well as classifier, and parallel processing multiple classifiers to accelerate the processing speed of the face detection system. Also we discuss the optimization of the proposed architecture which can be scalable for configurable devices with variable resources. The proposed architecture for face detection has been designed using Verilog HDL and implemented in Xilinx Virtex-5 FPGA. Its performance has been measured and compared with an equivalent software implementation. We show about 35 times increase of system performance over the equivalent software implementation.

Of interest for topic map applications designed to associate data with particular individuals.

The page offers useful links to other face recognition material.

Advanced Topics in Machine Learning

Thursday, June 23rd, 2011

Advanced Topics in Machine Learning

Andreas Krause and Daniel Golovin course at CalTech. Lecture notes, readings, this will keep you entertained for some time.


How can we gain insights from massive data sets?

Many scientific and commercial applications require us to obtain insights from massive, high-dimensional data sets. In particular, in this course we will study:

  • Online learning: How can we learn when we cannot fit the training data into memory? We will cover no regret online algorithms; bandit algorithms; sketching and dimension reduction.
  • Active learning: How should we choose few expensive labels to best utilize massive unlabeled data? We will cover active learning algorithms, learning theory and label complexity.
  • Nonparametric learning on large data: How can we let complexity of classifiers grow in a principled manner with data set size? We will cover large-­scale kernel methods; Gaussian process regression, classification, optimization and active set methods.

Why would a non-strong AI person list so much machine learning stuff?

Two reasons:

1) Machine learning techniques are incredibly useful in appropriate cases.

2) You have to understand machine learning to pick out the appropriate cases.

DiscoverText Free Tutorial Webinar

Monday, June 6th, 2011

DiscoverText Free Tutorial Webinar

Tuesday June 7 at 12:00 PM EST (Noon)

From the webinar announcement:

This Webinar introduces new and existing DiscoverText users to the basic document ingest, search & code features, takes your questions, and demonstrates our newest tool, a machine-learning classifier that is currently in beta testing. This is also a chance to preview our “New Navigation” and advanced filters.

DiscoverText’s latest additions to our “Do it Yourself” platform can be easily trained to perform customized mood, sentiment and topic classification. Any custom classification scheme or topic model can be created and implemented by the user. You can also generate tag clouds and drill into the most frequently occurring terms or use advanced search and filters to create “buckets” of text.

The system makes it possible to capture, share and crowd source text data analysis in novel ways. For example, you can collect text content off Facebook, Twitter & YouTube, as well as other social media or RSS feeds. Dataset owners can assign their “peers” to coding tasks. It is simple to measure the reliability of two or more coder’s choices. A distinctive feature is the ability to adjudicate coder choices for training purposes or to report validity by code, coder or project.

So please join us Tuesday June 7 at 12:00 PM EST (Noon) for an interactive Webinar. Find out why sorting thousands of items from social media, email and electronic document repositories is easier than ever. Participants in the Webinar will be invited to become beta testers of the new classification application.

I haven’t tried the software, free version or otherwise but will try to attend the webinar and report back.

What are the Differences between Bayesian Classifiers and Mutual-Information Classifiers?

Tuesday, May 3rd, 2011

What are the Differences between Bayesian Classifiers and Mutual-Information Classifiers?

I am sure we have all laid awake at night worrying about this question at some point. 😉

Seriously, the paper shows that Bayesian and mutual information classifiers compliment each other in classification roles and merits your attention.


In this study, both Bayesian classifiers and mutual information classifiers are examined for binary classifications with or without a reject option. The general decision rules in terms of distinctions on error types and reject types are derived for Bayesian classifiers. A formal analysis is conducted to reveal the parameter redundancy of cost terms when abstaining classifications are enforced. The redundancy implies an intrinsic problem of “non-consistency” for interpreting cost terms. If no data is given to the cost terms, we demonstrate the weakness of Bayesian classifiers in class-imbalanced classifications. On the contrary, mutual-information classifiers are able to provide an objective solution from the given data, which shows a reasonable balance among error types and reject types. Numerical examples of using two types of classifiers are given for confirming the theoretical differences, including the extremely-class-imbalanced cases. Finally, we briefly summarize the Bayesian classifiers and mutual-information classifiers in terms of their application advantages, respectively.

After detailed analysis, which will be helpful in choosing appropriate situations for the use of Bayesian or mutual information classifiers, the paper concludes:

Bayesian and mutual-information classifiers are different essentially from their applied learning targets. From application viewpoints, Bayesian classifiers are more suitable to the cases when cost terms are exactly known for trade-off of error types and reject types. Mutual-information classifiers are capable of objectively balancing error types and reject types automatically without employing cost terms, even in the cases of extremely class-imbalanced datasets, which may describe a theoretical interpretation why humans are more concerned about the accuracy of rare classes in classifications.

Biomedical Machine Learning Classifiers

Monday, April 18th, 2011

A Software Framework for Building Biomedical Machine Learning Classifiers through Grid Computing Resources by Raul Pollán, Miguel Angel Guevara Lopez and Eugénio da Costa Oliveira.


This paper describes the BiomedTK software framework, created to perform massive explorations of machine learning classifiers configurations for biomedical data analysis over distributed Grid computing resources. BiomedTK integrates ROC analysis throughout the complete classifier construction process and enables explorations of large parameter sweeps for training third party classifiers such as artificial neural networks and support vector machines, offering the capability to harness the vast amount of computing power serviced by Grid infrastructures. In addition, it includes classifiers modified by the authors for ROC optimization and functionality to build ensemble classifiers and manipulate datasets (import/export, extract and transform data, etc.). BiomedTK was experimentally validated by training thousands of classifier configurations for representative biomedical UCI datasets reaching in little time classification levels comparable to those reported in existing literature. The comprehensive method herewith presented represents an improvement to biomedical data analysis in both methodology and potential reach of machine learning based experimentation.

I recommend a close reading of the article but the concluding lines caught my eye:

…tuning classifier parameters is mostly a heuristic task, not existing rules providing knowledge about what parameters to choose when training a classifier. Through BiomedTK we are gathering data about performance of many classifiers, trained each one with different parameters, ANNs, SVM, etc. This by itself constitutes a dataset that can be data mined to understand what set of parameters yield better classifiers for given situations or even generally. Therefore, we intend to use BiomedTK on this bulk of classifier data to gain insight on classifier parameter tuning.

The dataset about training classifiers may be as important if not more so than use of the framework in harnessing Grid computing resources for biomedical analysis. Looking forward to reports on that dataset.

The Wekinator

Monday, April 11th, 2011

The Wekinator: Software for using machine learning to build real-time interactive systems

This looks very cool!

I can imagine topic maps of sounds/gestures in a number of contexts that would be very interesting.

From the website:

The Wekinator is a free software package to facilitate rapid development of and experimentation with machine learning in live music performance and other real-time domains. The Wekinator allows users to build interactive systems by demonstrating human actions and computer responses, rather than by programming.

Example applications:

  • Creation of new musical instruments
    • Create mappings between gesture and computer sounds. Control a drum machine using your webcam! Play Ableton using a Kinect!

  • Creation of gesturally-controlled animations and games
    • Control interactive visual environments like Processing or Quartz Composer, or game engines like Unity, using gestures sensed from webcam, Kinect, Arduino, etc.

  • Creation of systems for gesture analysis and feedback
    • Build classifiers to detect which gesture a user is performing. Use the identified gesture to control the computer or to inform the user how he’s doing.

  • Creation of real-time music information retrieval and audio analysis systems
    • Detect instrument, genre, pitch, rhythm, etc. of audio coming into the mic, and use this to control computer audio, visuals, etc.

  • Creation of other interactive systems in which the computer responds in real-time to some action performed by a human user (or users)
    • Anything that can output OSC can be used as a controller
    • Anything that can be controlled by OSC can be controlled by Wekinator

How to apply Naive Bayes Classifiers to document classification problems.

Wednesday, March 23rd, 2011

How to apply Naive Bayes Classifiers to document classification problems.

Nils Haldenwang does a good job of illustrating the actual application of a naive Bayes classifier to document classification.

A good introduction to an important topic for the construction of topic maps.