On ranking relevant entities in heterogeneous networks using a language-based model by Laure Soulier, Lamjed Ben Jabeur, Lynda Tamine, Wahiba Bahsoun. (Soulier, L., Jabeur, L. B., Tamine, L. and Bahsoun, W. (2013), On ranking relevant entities in heterogeneous networks using a language-based model. J. Am. Soc. Inf. Sci.. doi: 10.1002/asi.22762)
A new challenge, accessing multiple relevant entities, arises from the availability of linked heterogeneous data. In this article, we address more specifically the problem of accessing relevant entities, such as publications and authors within a bibliographic network, given an information need. We propose a novel algorithm, called BibRank, that estimates a joint relevance of documents and authors within a bibliographic network. This model ranks each type of entity using a score propagation algorithm with respect to the query topic and the structure of the underlying bi-type information entity network. Evidence sources, namely content-based and network-based scores, are both used to estimate the topical similarity between connected entities. For this purpose, authorship relationships are analyzed through a language model-based score on the one hand and on the other hand, non topically related entities of the same type are detected through marginal citations. The article reports the results of experiments using the Bibrank algorithm for an information retrieval task. The CiteSeerX bibliographic data set forms the basis for the topical query automatic generation and evaluation. We show that a statistically significant improvement over closely related ranking models is achieved.
Note the “estimat[ion] of topic similarity between connected entities.”
Very good work but rather than a declaration of similarity (topic maps) we have an estimate of similarity.
Before you protest about the volume of literature/data, recall that some author write the documents in question. And selected the terms and references found therein.
Rather than guessing what may be similar to what the author wrote, why not devise a method to allow the author to say?
And build upon similarity/sameness declarations across heterogeneous networks of data.