Meet the algorithm that can learn “everything about anything” by Dennis Harris.
From the post:
One of the more interesting projects is a system called LEVAN, which is short for Learn EVerything about ANything and was created by a group of researchers out of the Allen Institute for Artificial Intelligence and the University of Washington. One of them, Carlos Guestrin, is also co-founder and CEO of a data science startup called GraphLab. What’s really interesting about LEVAN is that it’s neither human-supervised nor unsupervised (like many deep learning systems), but what its creators call “webly supervised.”
(image omitted)
What that means, essentially, is that LEVAN uses the web to learn everything it needs to know. It scours Google Books Ngrams to learn common phrases associated with a particular concept, then searches for those phrases in web image repositories such as Google Images, Bing and Flickr. For example, LEVAN now knows that “heavyweight boxing,” “boxing ring” and “ali boxing” are all part of the larger concept of “boxing,” and it knows what each one looks like.
When I said “sort of” in the title I didn’t mean any disrespect for LEVAN. On the contrary, the researchers limiting LEVAN to Google Book Ngrams and images is a brilliant move. That limits LEVAN to the semantic debris that can be found in public image repositories but depending upon your requirements, that may be more than sufficient.
The other upside is that despite a pending patent, sigh, the source code is available for research/academic purposes.
What data sets make useful limits for your AI/machine learning algorithm? Your application need not understand intercepted phone conversations, Barbara Walters, or popular music, if those are not in your requirements. Simplifying your AI problem may be the first step towards solving it.