A self-training approach for resolving object coreference on the semantic web by Wei Hu, Jianfeng Chen, and Yuzhong Qu, all of Nanjing University, Nanjing, China.
Abstract:
An object on the Semantic Web is likely to be denoted with multiple URIs by different parties. Object coreference resolution is to identify “equivalent” URIs that denote the same object. Driven by the Linking Open Data (LOD) initiative, millions of URIs have been explicitly linked with owl:sameAs statements, but potentially coreferent ones are still considerable. Existing approaches address the problem mainly from two directions: one is based upon equivalence inference mandated by OWL semantics, which finds semantically coreferent URIs but probably omits many potential ones; the other is via similarity computation between property-value pairs, which is not always accurate enough. In this paper, we propose a self-training approach for object coreference resolution on the Semantic Web, which leverages the two classes of approaches to bridge the gap between semantically coreferent URIs and potential candidates. For an object URI, we firstly establish a kernel that consists of semantically coreferent URIs based on owl:sameAs, (inverse) functional properties and (max-)cardinalities, and then extend such kernel iteratively in terms of discriminative property-value pairs in the descriptions of URIs. In particular, the discriminability is learnt with a statistical measurement, which not only exploits key characteristics for representing an object, but also takes into account the matchability between properties from pragmatics. In addition, frequent property combinations are mined to improve the accuracy of the resolution. We implement a scalable system and demonstrate that our approach achieves good precision and recall for resolving object coreference, on both benchmark and large-scale datasets.
Interesting work.
In particular the use of property-value pairs in the service of discovering similarity.
So, why are users limited to owl:sameAs?
If machines can discover property-value pairs that identify “objects,” then why not enable users to declare property-value pairs that identify the same “objects?”
Such declarations could be used by both machines and users.