Saturday, May 4th, 2013

Search: Emergent and Extrinsic Semantics by John Tait.



Semantics is a term often used in the search technology and information retrieval community these days. A distinction is drawn between semantic and traditional search, implying that somehow semantic search is a more advanced or sophisticated form.

My claim in this article is that there are actually two forms of semantic search: emergent and extrinsic. Further I want to claim that they are related, and that one of them (emergent) is not new but has been in widespread use since the 1980’s when “natural language querying” (as embodied in Google for example) started to supplant pure Boolean querying as the usual query form for search on unstructured data.

My dictionary defines semantics as the branch of the science of language related to meaning. In search technology and information retrieval it has come to be associated with two very distinct ideas and communities.


Now it is very common to see emergent and extrinsic as somehow contrasting and irreconcilable. Whereas I want to claim they are really two sides of the same coin, and further complementary and supporting.

It is common for those in the extrinsic (really semantic web community) to be somewhat dismissive towards to emergent community: seeing the basis of their work as lacking (real) semantics. This misses the point, which is that there must be some notion of semantics in therein emergent systems, because even simple word matching is dealing with semantic notions like synonymy: crudely the same words (space delimited strings of characters in English text) in similar contexts often mean the same. The problem is that emergent semantics are obscure, hidden, and difficult to access.

In my view the difficulty of making visible the knowledge hidden in the term weighting schemes and indexing systems has led people to make the mistaken jump to the conclusion that these systems contain no semantics. My claim is that they do have semantics: but emergent semantics are generally obscure.

My first difficulty with John’s position is his odd use of the term “emergent semantics.”

He appears to be defining the term as: “…term weighting schemes and indexing systems….” for example.

A more common definition is found in Emergent Semantics by Philippe Cudre-Mauroux

Emergent semantics applies the conception of a closed correspondence continuum to the analysis of semantics in distributed information systems, by promoting recursive analyses of syntactic constructs { such asschemas, ontologies or mappings { in order to capture semantics.

Nor is it useful to claim that Tait-Emergent-Semantics (to distinguish it from the more common usage) has semantics but they are obscure.

If the semantics of Tait-Emergent-Semantics cannot be seen, then other evidence should be offered.

Saying that evidence for a proposition is “obscure” isn’t very convincing.