John Giannandrea on Freebase – A Rosetta Stone for Entities by Daniel Tunkelang.
From the post:
John started by introducing Freebase as a representation of structured objects corresponding to real-world entities and connected by a directed graph of relationships. In other words, a semantic web. While it isn’t quite web-scale, Freebase is a large and growing knowledge base consisting of 25 million entities and 500 million connections — and doubling annually. The core concept in Freebase is a type, and an entity can have many types. For example, Arnold Schwarzenegger is a politician and an actor. John emphasized the messiness of the real world. For example, most actors are people, but what about the dog who played Lassie? It’s important to support exceptions.
The main technical challenge for Freebase is reconciliation — that is, determining how similar a set of data is to existing Freebase topics. John pointed out how critical it is for Freebase to avoid duplication of content, since the utility of Freebase depends on unique nodes in its graph corresponding to unique objects in the world. Freebase obtains many of its entities by reconciling large, open-source knowledge bases — including Wikipedia, WordNet, Library of Congress Authorities, and metadata from the Stanford Library. Freebase uses a variety of tools to implement reconciliation, including Google Refine (formerly known as Freebase Gridworks) and Matchmaker, a tool for gathering human judgments. While reconciliation is a hard technical problem, it is made possible by making inferences across the web of relationships that link entities to one another.
John then presented Freebase as a Rosetta Stone for entities on the web. Since an entity is simply a collection of keys (one of which is its name), Freebase’s job is to reverse engineer the key-value store that is distributed among the entity’s web references, e.g., the structured databases backing web sites and encoding keys in URL parameters. He noted that Freebase itself is schema-less (it is a graph database), and that even the concept of a type is itself an entity (“Type type is the only type that is an instance of itself”). Google makes Freebase available through an API and the Metaweb Query Language (MQL).
(emphasis added)
<tedious-self-justification>…., entity is a collection of keys indeed! Key/value pairs I would say, with no presumptions about the structure of either one.</tedious-self-justification>
There is not now nor will there ever be agreement on the “unique objects in the world.” And why should that be a value? If we have the key/value pairs, we can each arrive at our own conclusions about whether certain “unique nodes” correspond to what we think of as “unique objects in the world.”
I suspect, but don’t know having never asked former President Bush II, that we disagree on the existence of any unique objects in the world and it is unlikely there is any evidence that would persuade either one of us to change.
Remember the Rosetta Stone had three (3) version of the same inscription. It did not try to say one version was closer to the original than the others.
The Rosetta Stone is one of the earliest honorings of semantic diversity. Unlike systems that try to push only one common semantic or vision.