Communicating and resolving entity references by R.V. Guha.
Abstract:
Statements about entities occur everywhere, from newspapers and web pages to structured databases. Correlating references to entities across systems that use different identifiers or names for them is a widespread problem. In this paper, we show how shared knowledge between systems can be used to solve this problem. We present “reference by description”, a formal model for resolving references. We provide some results on the conditions under which a randomly chosen entity in one system can, with high probability, be mapped to the same entity in a different system.
An eye appointment is going to prevent me from reading this paper closely today.
From a quick scan, do you think Guha is making a distinction between entities and subjects (in the topic map sense)?
What do you make of literals having no identity beyond their encoding? (page 4, #3)
Redundant descriptions? (page 7) Would you say that defining a set of properties that must match would qualify? (Or even just additional subject indicators?)
Expect to see a lot more comments on this paper.
Enjoy!
I first saw this in a tweet by Stefano Bertolo.