Part of a series of posts on clustering. From the most recent post (13 June 2011):
When abstracting the clustering problem, we often assume that the data is perfectly clusterable and so we only need to find the right clusters. But what if your data is not so perfect? Maybe there’s background noise, or a few random points were added to your data by an adversary. Some clustering formulations, in particular k-center or k-means, are not stable — the addition of a single point can dramatically change the optimal clustering. For the case of k-center, if a single point x is added far away from all of the original data points, it will become its own center in the optimum solution, necessitating that the other points are only clustered with k−1 centers.
Clustering is relevant to topic maps in a couple of ways.
First, there are numerous collective subjects, sports teams, music groups, military units, etc., that all have some characteristic by which they can be gathered together.
Second, in some very real sense, when all the information about a subject is brought together, clustering would be a fair description of that activity. True, it is clustering with some extra processing thrown in but it is still clustering. Just a bit more fine grained.
Not to mention that researchers have been working on clustering algorithms for years and they certainly should be part of any topic map authoring tool.