DUALIST: Utility for Active Learning with Instances and Semantic Terms
From the webpage:
DUALIST is an interactive machine learning system for quickly building classifiers for text processing tasks. It does so by asking “questions” of a human “teacher” in the form of both data instances (e.g., text documents) and features (e.g., words or phrases). It uses active learning and semi-supervised learning to build text-based classifiers at interactive speed.
(video demo omitted)
The goals of this project are threefold:
- A practical tool to facilitate annotation/learning in text analysis projects.
- A framework to facilitate research in interactive and multi-modal active learning. This includes enabling actual user experiments with the GUI (as opposed to simulated experiments, which are pervasive in the literature but sometimes inconclusive for use in practice) and exploring HCI issues, as well as supporting new dual supervision algorithms which are fast enough to be interactive, accurate enough to be useful, and might make more appropriate modeling assumptions than multinomial naive Bayes (the current underlying model).
- A starting point for more sophisticated interactive learning scenarios that combine multiple “beyond supervised learning” strategies. See the proceedings of the recent ICML 2011 workshop on this topic.
This could be quite useful for authoring a topic map across a corpus of materials. With interactive recognition of occurrences of subjects, etc.
Sponsored in part by the folks at DARPA. Unlike Al Gore, they did build the Internet.