Learning Richly Structured Representations From Weakly Annotated Data by Daphne Koller. (DeGroot Lecture, Carnegie Mellon University, October 14, 2011).
Abstract:
The solution to many complex problems require that we build up a representation that spans multiple levels of abstraction. For example, to obtain a semantic scene understanding from an image, we need to detect and identify objects and assign pixels to objects, understand scene geometry, derive object pose, and reconstruct the relationships between different objects. Fully annotated data for learning richly structured models can only be obtained in very limited quantities; hence, for such applications and many others, we need to learn models from data where many of the relevant variables are unobserved. I will describe novel machine learning methods that can train models using weakly labeled data, thereby making use of much larger amounts of available data, with diverse levels of annotation. These models are inspired by ideas from human learning, in which the complexity of the learned models and the difficulty of the training instances tackled changes over the course of the learning process. We will demonstrate the applicability of these ideas to various problems, focusing on the problem of holistic computer vision.
If your topic map application involves computer vision, this is a must see video.
For text/data miners, are you faced with similar issues? Limited amounts of richly annotated training data?
I saw a slide, will run it down later, that had text running from plain text to annotated with ontological data. I mention that because that isn’t what a user sees when they “read” a text. They see implied relationships, references to other subjects, other instances of a particular subject, and all that passes in the instance of recognition.
Perhaps the problem of correct identification in text is one of too few dimensions than too many.