Topic Maps, Google and the Billion Fact Parade

Andrew Hogue (Google) actually titled his presentation on Google’s plan for Freebase: The Structured Search Engine.

Several minutes into the presentation Hogue points out that to answer the question, “when was Martin Luther King, Jr. born?” that date of birth, date born, appeared, dob were all considered synonyms that expect the date type.

Hmmm, he must mean keys that represent the same subject and so subject to merging and possibly, depending on their role in a subject representative, further merging of those subject representatives. Can you say Steve Newcomb and the TMRM?

Yes, attribute names represent subjects just like collections of attributes are thought to represent subjects. And benefit from rules specifying subject identity, other properties and merging rules. (Some of those rules can be derived from mechanical analysis, others probably not.)

Second, Hogue points out that Freebase had 13 million entities when purchased by Google. He speculates on taking that to 1 billion entities.

Let’s cut to the chase, I will see Hogue’s 1 billion entities and raise him 9 billion entities for a total pot of 10 billion entities.

Now what?

Let’s take a simple question that Hogue’s 10 billion entity Google/Freebase cannot usefully answer.

What is democracy?

Seems simple enough. (viewers at home can try this with their favorite search engine.)

1) United States State Department: Democracy means a state that support Israel, keeps the Suez canal open and opposes people we don’t like in the U.S. Oh, and that protects the rights and social status of the wealthy, almost forgot that one. Sorry.

2) Protesters in Egypt (my view): Democracy probably does not include some or all of the points I mention for #1.

3) Turn of the century U.S.: Effectively only the white male population participates.

4) Early U.S. history: Land ownership is a requirement.

I am sure examples can be supplied from other “democracies” and their histories around the world.

This is a very important term and it differing use by different people in different contexts, is going to make discussion and negotiations more difficult.

There are lots of terms where no single “entity” or “fact” that is going to work for everyone.

Subject identity is a tough question and the identification of a subject changes over time, social context, etc. Not to mention that the subjects identified by particular identifications change as well.

Consider that at one time cab was not used to refer to a method of transportation but to a brothel. You may object that was “slang” usage but if I am searching an index of police reports for that time period for raids on brothel’s, your objection isn’t helpful. Doesn’t matter if the usage is “slang” or not, I need to obtain accurate results.

User expectations and needs cannot (or at least should not in my opinion) be adapted to the limitations of a particular approach or technology.

Particularly when we already know of strategies that can help with, not solve, the issues surrounding subject identity.

The first step that Hogue and Google have taken, recognizing that attribute names can have synonyms, is a good start. In topic map terms, recognizing that information structures are composed of subjects as well. So that we can map between information structures, rather than replacing one with another. (Or having religious discussions about which one is better, etc.)

Hogue and Google are already on the way to treating some subjects as worthy of more effort than others, but for those that merit the attention, solving the issue of to reliable, repeatable subject identification, is non-trivial.

Topic maps can make a number of suggestions that can help with that task.

3 Responses to “Topic Maps, Google and the Billion Fact Parade”

  1. […] This post was mentioned on Twitter by Geir Ove Grønmo, Patrick Durusau. Patrick Durusau said: Topic Maps, Google and the Billion Fact Parade, #topicmaps #google #freebase […]

  2. Robert Barta says:

    > This is a very important term and it differing use by different
    > people in different contexts, is going to make discussion and
    > negotiations more difficult.

    All true, but the agenda is – and has been – to produce ONE FLAT WORLD.

    People want simplicity, and that is what they get. Only the “simpler”
    (on the surface) technology perveils, and with the everfaster cycle of
    technology rollover, there is little time for technology to differentiate.

    Mao would be happy about this constant “revolution” :-)


  3. Patrick Durusau says:

    @Robert – well, true, but only the agenda of those who see only a flat world.

    Those were the same people who were warning people away from careers in physics in the late 19th century because there were only a couple of niggling problems and they would be solved soon. (Hint for other readers: It didn’t turn out that way.)

    It has been ever thus and yet technological progress does happen. Despite the flat land imperative. There probably is some sociology literature on that somewhere.

    I take that as more of a challenge than a cause for despair.