Archive for the ‘Active Learning’ Category

DUALIST: Utility for Active Learning with Instances and Semantic Terms

Wednesday, April 18th, 2012

DUALIST: Utility for Active Learning with Instances and Semantic Terms

From the webpage:

DUALIST is an interactive machine learning system for quickly building classifiers for text processing tasks. It does so by asking “questions” of a human “teacher” in the form of both data instances (e.g., text documents) and features (e.g., words or phrases). It uses active learning and semi-supervised learning to build text-based classifiers at interactive speed.

(video demo omitted)

The goals of this project are threefold:

  1. A practical tool to facilitate annotation/learning in text analysis projects.
  2. A framework to facilitate research in interactive and multi-modal active learning. This includes enabling actual user experiments with the GUI (as opposed to simulated experiments, which are pervasive in the literature but sometimes inconclusive for use in practice) and exploring HCI issues, as well as supporting new dual supervision algorithms which are fast enough to be interactive, accurate enough to be useful, and might make more appropriate modeling assumptions than multinomial naive Bayes (the current underlying model).
  3. A starting point for more sophisticated interactive learning scenarios that combine multiple “beyond supervised learning” strategies. See the proceedings of the recent ICML 2011 workshop on this topic.

This could be quite useful for authoring a topic map across a corpus of materials. With interactive recognition of occurrences of subjects, etc.

Sponsored in part by the folks at DARPA. Unlike Al Gore, they did build the Internet.

Will the Circle Be Unbroken? Interactive Annotation!

Wednesday, February 29th, 2012

I have to agree with Bob Carpenter, the title is a bit much:

Closing the Loop: Fast, Interactive Semi-Supervised Annotation with Queries on Features and Instances

From the post:

Whew, that was a long title. Luckily, the paper’s worth it:

Settles, Burr. 2011. Closing the Loop: Fast, Interactive Semi-Supervised Annotation With Queries on Features and Instances. EMNLP.

It’s a paper that shows you how to use active learning to build reasonably high-performance classifier with only minutes of user effort. Very cool and right up our alley here at LingPipe.

Both the paper and Bob’s review merit close reading.

Active learning: far from solved

Wednesday, October 12th, 2011

Active learning: far from solved

From the post:

As Daniel Hsu and John Langford pointed out recently, there has been a lot of recent progress in active learning. This is to the point where I might actually be tempted to suggest some of these algorithms to people to use in practice, for instance the one John has that learns faster than supervised learning because it’s very careful about what work it performs. That is, in particular, I might suggest that people try it out instead of the usual query-by-uncertainty (QBU) or query-by-committee (QBC). This post is a brief overview of what I understand of the state of the art in active learning (paragraphs 2 and 3) and then a discussion of why I think (a) researchers don’t tend to make much use of active learning and (b) why the problem is far from solved. (a will lead to b.)

This is a deeply interesting article that could give rise to mini and major projects. I particularly like his point about not throwing away training data. No, you have to read the post for yourself. It’s not that long.