Random projection trees have proven to be effective for approximate nearest neighbor searches in high dimensional spaces where conventional methods are not applicable due to excessive usage of memory and computational time. We show that building multiple trees on the same data can improve the performance even further, without significantly increasing the total computational cost of queries when executed in a modern parallel computing environment. Our experiments identify suitable parameter values to achieve accurate searches with extremely fast query times, while also retaining a feasible complexity for index construction.
Not a quick read but an important one if you want to use multiple dimensions for calculation of similarity or sameness between two or more topics.
The technique requires you to choose a degree of similarity that works for your use case.
This paper makes a nice jumping off point for discussing how much precision does a particular topic map application need? Absolute precision is possible, but only in a limited number of cases and I suspect at high cost.
For some use cases, such as searching for possible suspects in crimes, some lack of precision is necessary to build up a large enough pool of suspects to include the guilty parties.
Any examples of precision and topic maps that come to mind?