Finding New Story Links Through Blog Clustering
Matthew Hurst writes:
The basic mechanism used in track // microsoft to cluster articles is similar to that used by Techmeme. A fixed set of blogs are crawled and clustered based on specific features such as link structure and content (and in the case of Techmeme, additional human input). However, what about blogs that aren't known to the system?
I recently added a feature to track // microsoft which analyses clusters for popular urls and adds those to the bottom of the cluster. The title of the web page is used as a simple description of the popular page.
In the recent story about Nuno Silva's mistaken comment regarding the future of Windows Phone devices, there were many links to Nuno's own blog post. In addition to the large cluster of known blogs that were determined to be talking about the story, track // microsoft also surfaced Nuno's post through analysing the popular links discovered within the cluster.
Interesting blog discovery method.