Archive for the ‘Nonmetric Similarity’ Category

On nonmetric similarity search problems in complex domains

Saturday, February 25th, 2012

On nonmetric similarity search problems in complex domains by Tomáš Skopal and Benjamin Bustos.


The task of similarity search is widely used in various areas of computing, including multimedia databases, data mining, bioinformatics, social networks, etc. In fact, retrieval of semantically unstructured data entities requires a form of aggregated qualification that selects entities relevant to a query. A popular type of such a mechanism is similarity querying. For a long time, the database-oriented applications of similarity search employed the definition of similarity restricted to metric distances. Due to its topological properties, metric similarity can be effectively used to index a database which can then be queried efficiently by so-called metric access methods. However, together with the increasing complexity of data entities across various domains, in recent years there appeared many similarities that were not metrics—we call them nonmetric similarity functions. In this article we survey domains employing nonmetric functions for effective similarity search, and methods for efficient nonmetric similarity search. First, we show that the ongoing research in many of these domains requires complex representations of data entities. Simultaneously, such complex representations allow us to model also complex and computationally expensive similarity functions (often represented by various matching algorithms). However, the more complex similarity function one develops, the more likely it will be a nonmetric. Second, we review state-of-the-art techniques for efficient (fast) nonmetric similarity search, concerning both exact and approximate search. Finally, we discuss some open problems and possible future research trends.

The first paragraph of the conclusion of this survey on nonmetric similarity is an argument for topic maps (or at least the result of using a topic map):

In this article, we have surveyed the current situation concerning the employment of nonmetric similarity functions for effective and efficient similarity search in complex domains. One of the main results of the article is a surprising revelation that nonmetric similarity measuring is widely used in isolated domains, spanning many areas of interdisciplinary research. This includes multimedia databases, time series, and medical, scientific, chemical, and bioinformatic tasks, among others. (emphasis added)

True enough, survey articles such as this one may tempt a few researchers and possibly graduate students to peek over the discipline walls, however briefly. But research articles need to routinely cite the literature of other disciplines, betraying a current awareness of other fields. To take advantage of advances in other fields as well as to serve as an example for the next generation of researchers.