John Sowa’s posting on Watson merits reproduction in its entirety (lite editing to make it format for easy reading):
Thanks for the reminder:
Dave Ferrucci gave a talk on UIMA (the Unstructured Information Management Architecture) back in May-2006, entitled: “Putting the Semantics in the Semantic Web: An overview of UIMA and its role in Accelerating the Semantic Revolution”
I recommend that readers compare Ferrucci’s talk about UIMA in 2006 with his talk about the Watson system and Jeopardy in 2011. In less than 5 years, they built Watson on the UIMA foundation, which contained a reasonable amount of NLP tools, a modest ontology, and some useful tools for knowledge acquisition. During that time, they added quite a bit of machine learning, reasoning, statistics, and heuristics. But most of all, they added terabytes of documents.
For the record, following are Ferrucci’s slides from 2006:
Following is the talk that explains the slides:
And following is his recent talk about the DeepQA project for building and extending that foundation for Jeopardy:
Compared to Ferrucci’s talks, the PBS Nova program was a disappointment. It didn’t get into any technical detail, but it did have a few cameo appearances from AI researchers. Terry Winograd and Pat Winston, for example, said that the problem of language understanding is hard.
But I thought that Marvin Minsky and Doug Lenat said more with their tone of voice than with their words. My interpretation (which could, of course, be wrong) is that both of them were seething with jealousy that IBM built a system that was competing with Jeopardy champions on national TV — and without their help.
In any case, the Watson project shows that terabytes of documents are far more important for commonsense reasoning than the millions of formal axioms in Cyc. That does not mean that the Cyc ontology is useless, but it undermines the original assumptions for the Cyc project: commonsense reasoning requires a huge knowledge base of hand-coded axioms together with a powerful inference engine.
An important observation by Ferrucci: The URIs of the Semantic Web are *not* useful for processing natural languages — not for ordinary documents, not for scientific documents, and especially not for Jeopardy questions:
1. For scientific documents, words like ‘H2O’ are excellent URIs. Adding an http address in front of them is pointless.
2. A word like ‘water’, which is sometimes a synonym for ‘H2O’, has an open-ended number of senses and microsenses.
3. Even if every microsense could be precisely defined and cataloged on the WWW, that wouldn’t help determine which one is appropriate for any particular context.
4. Any attempt to force human being(s) to specify or select a precise sense cannot succeed unless *every* human understands and consistently selects the correct sense at *every* possible occasion.
5. Given that point #4 is impossible to enforce and dangerous to assume, any software that uses URIs will have to verify that the selected sense is appropriate to the context.
6. Therefore, URIs found “in the wild” on the WWW can never be assumed to be correct unless they have been guaranteed to be correct by a trusted source.
These points taken together imply that annotations on documents can’t be trusted unless (a) they have been generated by your own system or (b) they were generated by a system which is at least as trustworthy as your own and which has been verified to be 100% compatible with yours.
In summary, the underlying assumptions for both Cyc and the Semantic Web need to be reconsidered.
You can see the post at: http://ontolog.cim3.net/forum/ontolog-forum/2011-02/msg00114.html
I don’t always agree with Sowa but he has written extensively on conceptual graphs, knowledge representation and ontological matters. See http://www.jfsowa.com/
I missed the local showing but found the video at: Smartest Machine on Earth.
You will find a link to an interview with Minsky at that same location.
I don’t know that I would describe Minsky as “…seething with jealousy….”
While I enjoy Jeopardy and it is certainly more cerebral than say American Idol, I think Minsky is right in seeing the Watson effort as something other than artificial intelligence.
Q: In 2011, who was the only non-sentient contestant on the TV show Jeopardy?
A: What is IBM’s Watson?