Archive for the ‘Similarity Retrieval’ Category

Similarity Search and Applications

Friday, January 18th, 2013

International Conference on Similarity Search and Applications (SISAP 2013)

From the webpage:

The International Conference on Similarity Search and Applications (SISAP) is an annual forum for researchers and application developers in the area of similarity data management. It aims at the technological problems shared by numerous application domains, such as data mining, information retrieval, computer vision, pattern recognition, computational biology, geography, biometrics, machine learning, and many others that need similarity searching as a necessary supporting service.

The SISAP initiative (www.sisap.org) aims to become a forum to exchange real-world, challenging and innovative examples of applications, new indexing techniques, common test-beds and benchmarks, source code and up-to-date literature through its web page, serving the similarity search community. Traditionally, SISAP puts emphasis on the distance-based searching, but in general the conference concerns both the effectiveness and efficiency aspects of any similarity search problem.

Dates:

Paper submission: April 2013
Notification: June 2013
Final version: July 2013
Conference: October 2, 3, and 4, 2013

The specific topics include, but are not limited to:

  • Similarity queries – k-NN, range, reverse NN, top-k, etc.
  • Similarity operations – joins, ranking, classification, categorization, filtering, etc.
  • Evaluation techniques for similarity queries and operations
  • Merging/combining multiple similarity modalities
  • Cost models and analysis for similarity data processing
  • Scalability issues and high-performance similarity data management
  • Feature extraction for similarity-based data findability
  • Test collections and benchmarks
  • Performance studies, benchmarks, and comparisons
  • Similarity Search over outsourced data repositories
  • Similarity search cloud services
  • Languages for similarity databases
  • New modes of similarity for complex data understanding
  • Applications of similarity-based operations
  • Image, video, voice, and music (multimedia) retrieval systems
  • Similarity for forensics and security

You should be able to find one or more topics that interest you. ;-)

How similar must two or more references to an entity be before they are identifying the same entity?

Or for that matter, is similarity an association between two or more references?

On nonmetric similarity search problems in complex domains

Saturday, February 25th, 2012

On nonmetric similarity search problems in complex domains by Tomáš Skopal and Benjamin Bustos.

Abstract:

The task of similarity search is widely used in various areas of computing, including multimedia databases, data mining, bioinformatics, social networks, etc. In fact, retrieval of semantically unstructured data entities requires a form of aggregated qualification that selects entities relevant to a query. A popular type of such a mechanism is similarity querying. For a long time, the database-oriented applications of similarity search employed the definition of similarity restricted to metric distances. Due to its topological properties, metric similarity can be effectively used to index a database which can then be queried efficiently by so-called metric access methods. However, together with the increasing complexity of data entities across various domains, in recent years there appeared many similarities that were not metrics—we call them nonmetric similarity functions. In this article we survey domains employing nonmetric functions for effective similarity search, and methods for efficient nonmetric similarity search. First, we show that the ongoing research in many of these domains requires complex representations of data entities. Simultaneously, such complex representations allow us to model also complex and computationally expensive similarity functions (often represented by various matching algorithms). However, the more complex similarity function one develops, the more likely it will be a nonmetric. Second, we review state-of-the-art techniques for efficient (fast) nonmetric similarity search, concerning both exact and approximate search. Finally, we discuss some open problems and possible future research trends.

The first paragraph of the conclusion of this survey on nonmetric similarity is an argument for topic maps (or at least the result of using a topic map):

In this article, we have surveyed the current situation concerning the employment of nonmetric similarity functions for effective and efficient similarity search in complex domains. One of the main results of the article is a surprising revelation that nonmetric similarity measuring is widely used in isolated domains, spanning many areas of interdisciplinary research. This includes multimedia databases, time series, and medical, scientific, chemical, and bioinformatic tasks, among others. (emphasis added)

True enough, survey articles such as this one may tempt a few researchers and possibly graduate students to peek over the discipline walls, however briefly. But research articles need to routinely cite the literature of other disciplines, betraying a current awareness of other fields. To take advantage of advances in other fields as well as to serve as an example for the next generation of researchers.