Andrew Hogue (Google) actually titled his presentation on Google’s plan for Freebase: The Structured Search Engine.
Several minutes into the presentation Hogue points out that to answer the question, “when was Martin Luther King, Jr. born?” that date of birth, date born, appeared, dob were all considered synonyms that expect the date type.
Hmmm, he must mean keys that represent the same subject and so subject to merging and possibly, depending on their role in a subject representative, further merging of those subject representatives. Can you say Steve Newcomb and the TMRM?
Yes, attribute names represent subjects just like collections of attributes are thought to represent subjects. And benefit from rules specifying subject identity, other properties and merging rules. (Some of those rules can be derived from mechanical analysis, others probably not.)
Second, Hogue points out that Freebase had 13 million entities when purchased by Google. He speculates on taking that to 1 billion entities.
Let’s cut to the chase, I will see Hogue’s 1 billion entities and raise him 9 billion entities for a total pot of 10 billion entities.
Now what?
Let’s take a simple question that Hogue’s 10 billion entity Google/Freebase cannot usefully answer.
What is democracy?
Seems simple enough. (viewers at home can try this with their favorite search engine.)
1) United States State Department: Democracy means a state that support Israel, keeps the Suez canal open and opposes people we don’t like in the U.S. Oh, and that protects the rights and social status of the wealthy, almost forgot that one. Sorry.
2) Protesters in Egypt (my view): Democracy probably does not include some or all of the points I mention for #1.
3) Turn of the century U.S.: Effectively only the white male population participates.
4) Early U.S. history: Land ownership is a requirement.
I am sure examples can be supplied from other “democracies” and their histories around the world.
This is a very important term and it differing use by different people in different contexts, is going to make discussion and negotiations more difficult.
There are lots of terms where no single “entity” or “fact” that is going to work for everyone.
Subject identity is a tough question and the identification of a subject changes over time, social context, etc. Not to mention that the subjects identified by particular identifications change as well.
Consider that at one time cab was not used to refer to a method of transportation but to a brothel. You may object that was “slang” usage but if I am searching an index of police reports for that time period for raids on brothel’s, your objection isn’t helpful. Doesn’t matter if the usage is “slang” or not, I need to obtain accurate results.
User expectations and needs cannot (or at least should not in my opinion) be adapted to the limitations of a particular approach or technology.
Particularly when we already know of strategies that can help with, not solve, the issues surrounding subject identity.
The first step that Hogue and Google have taken, recognizing that attribute names can have synonyms, is a good start. In topic map terms, recognizing that information structures are composed of subjects as well. So that we can map between information structures, rather than replacing one with another. (Or having religious discussions about which one is better, etc.)
Hogue and Google are already on the way to treating some subjects as worthy of more effort than others, but for those that merit the attention, solving the issue of to reliable, repeatable subject identification, is non-trivial.
Topic maps can make a number of suggestions that can help with that task.