Archive for the ‘Learning Classifier’ Category

Advanced Topics in Machine Learning

Thursday, June 23rd, 2011

Advanced Topics in Machine Learning

Andreas Krause and Daniel Golovin course at CalTech. Lecture notes, readings, this will keep you entertained for some time.


How can we gain insights from massive data sets?

Many scientific and commercial applications require us to obtain insights from massive, high-dimensional data sets. In particular, in this course we will study:

  • Online learning: How can we learn when we cannot fit the training data into memory? We will cover no regret online algorithms; bandit algorithms; sketching and dimension reduction.
  • Active learning: How should we choose few expensive labels to best utilize massive unlabeled data? We will cover active learning algorithms, learning theory and label complexity.
  • Nonparametric learning on large data: How can we let complexity of classifiers grow in a principled manner with data set size? We will cover large-­scale kernel methods; Gaussian process regression, classification, optimization and active set methods.

Why would a non-strong AI person list so much machine learning stuff?

Two reasons:

1) Machine learning techniques are incredibly useful in appropriate cases.

2) You have to understand machine learning to pick out the appropriate cases.

Intelligent Ruby + Machine Learning

Saturday, February 19th, 2011

Intelligent Ruby + Machine Learning

Entertaining slide deck that argues more data is better than less data and more models.

That its essential point but it does conclude with useful references.

It also has examples that may increase your interest in “machine learning.”

Consensus of Ambiguity: Theory and Application of Active Learning for Biomedical Image Analysis

Monday, October 25th, 2010

Consensus of Ambiguity: Theory and Application of Active Learning for Biomedical Image Analysis Authors: Scott Doyle, Anant Madabhushi Keywords:


Supervised classifiers require manually labeled training samples to classify unlabeled objects. Active Learning (AL) can be used to selectively label only “ambiguous” samples, ensuring that each labeled sample is maximally informative. This is invaluable in applications where manual labeling is expensive, as in medical images where annotation of specific pathologies or anatomical structures is usually only possible by an expert physician. Existing AL methods use a single definition of ambiguity, but there can be significant variation among individual methods. In this paper we present a consensus of ambiguity (CoA) approach to AL, where only samples which are consistently labeled as ambiguous across multiple AL schemes are selected for annotation. CoA-based AL uses fewer samples than Random Learning (RL) while exploiting the variance between individual AL schemes to efficiently label training sets for classifier training. We use a consensus ratio to determine the variance between AL methods, and the CoA approach is used to train classifiers for three different medical image datasets: 100 prostate histopathology images, 18 prostate DCE-MRI patient studies, and 9,000 breast histopathology regions of interest from 2 patients. We use a Probabilistic Boosting Tree (PBT) to classify each dataset as either cancer or non-cancer (prostate), or high or low grade cancer (breast). Trained is done using CoA-based AL, and is evaluated in terms of accuracy and area under the receiver operating characteristic curve (AUC). CoA training yielded between 0.01-0.05% greater performance than RL for the same training set size; approximately 5-10 more samples were required for RL to match the performance of CoA, suggesting that CoA is a more efficient training strategy.

The consensus of ambiguity (CoA) is trivially extensible to other image analysis. Intelligence photos anyone?

What intrigues me is extension of that approach to other types of data analysis.

Such as having multiple AL schemes process textual data and follow the CoA approach on what to bounce to experts for annotation.


  1. What types of ambiguity would this approach miss?
  2. How would you apply this method to other data?
  3. How would you measure success/failure of application to other data?
  4. Design and apply this concept to specified data set. (project)

A Survey of Genetics-based Machine Learning

Thursday, October 21st, 2010

A Survey of Genetics-based Machine Learning Author: Tim Kovacs


This is a survey of the field of Genetics-based Machine Learning (GBML): the application of evolutionary algorithms to machine learning. We assume readers are familiar with evolutionary algorithms and their application to optimisation problems, but not necessarily with machine learning. We briefly outline the scope of machine learning, introduce the more specific area of supervised learning, contrast it with optimisation and present arguments for and against GBML. Next we introduce a framework for GBML which includes ways of classifying GBML algorithms and a discussion of the interaction between learning and evolution. We then review the following areas with emphasis on their evolutionary aspects: GBML for sub-problems of learning, genetic programming, evolving ensembles, evolving neural networks, learning classifier systems, and genetic fuzzy systems.

The author’s preprint has 322 references. Plus there are slides, bibliographies in BibTeX.

If you are interesting in augmented topic map authoring using GBML, this would be a good starting place.


  1. Pick 3 subject areas. What arguments would you make in favor of GBML for augmenting authoring of a topic map for those subject areas?
  2. Same subject areas, but what arguments would you make against the use of GBML for augmenting authoring of a topic map for those subject areas?
  3. Design an experiment to test one of your arguments for and against GBML. (project, use of the literature encouraged)
  4. Convert the BibTeX formatted bibliographies into a topic map. (project)