The Amazing Mean Shift Algorithm by Larry Wasserman.
From the post:
The mean shift algorithm is a mode-based clustering method due to Fukunaga and Hostetler (1975) that is commonly used in computer vision but seems less well known in statistics.
The steps are: (1) estimate the density, (2) find the modes of the density, (3) associate each data point to one mode.
If you are puzzling over why I cited this post, it might help if you read “(3)” as:
(3) merge data points associated with one mode.
The notion that topics can only be merged on the basis of URLs, actually discrete values of any sort, is one way to think about merging. Your data may or may not admit to robust processing on that basis.
Those are all very good ways to merge topics, if and only if that works for your data.
If not, then you need to find ways that work with your data.