Massive Query Expansion by Exploiting Graph Knowledge Bases by Joan Guisado-Gámez, David Dominguez-Sal, Josep-LLuis Larriba-Pey.
Keyword based search engines have problems with term ambiguity and vocabulary mismatch. In this paper, we propose a query expansion technique that enriches queries expressed as keywords and short natural language descriptions. We present a new massive query expansion strategy that enriches queries using a knowledge base by identifying the query concepts, and adding relevant synonyms and semantically related terms. We propose two approaches: (i) lexical expansion that locates the relevant concepts in the knowledge base; and, (ii) topological expansion that analyzes the network of relations among the concepts, and suggests semantically related terms by path and community analysis of the knowledge graph. We perform our expansions by using two versions of the Wikipedia as knowledge base, concluding that the combination of both lexical and topological expansion provides improvements of the system’s precision up to more than 27%.
Heavy reading for the weekend but this paragraph caught my eye:
In this paper, we propose a novel query expansion which uses a hybrid input in the form of a query phrase and a context, which are a set of keywords and a short natural language description of the query phrase, respectively. Our method is based on the combination of both a lexical and topological analysis of the concepts related to the input in the knowledge base. We differ from previous works because we are not considering the links of each article individually, but we are mining the global link structure of the knowledge base to find related terms using graph mining techniques. With our technique we are able to identify: (i) the most relevant concepts and their synonyms, and (ii) a set of semantically related concepts. Most relevant concepts provide equivalent reformulations of the query that reduce the vocabulary mismatch. Semantically related concepts introduce many different terms that are likely to appear in a relevant document, which is useful to solve the lack of topic expertise and also disambiguate the keywords.
Wondering that since it works with Wikipedia, should the same be true for the references but not hyperlinks of traditional publishing?
Say documents in Citeseer for example?
Nothing against Wikipedia but general knowledge doesn’t have a very high retail value.