I’ve been wanting to write a post about some of the research I’ve been working on but I realized that my work has gotten really technical and not very accessible to someone not in machine learning. So I thought as a precursor I would write about spectral clustering at a very high level and with lots of pictures. I hope you can follow along. For a more technical introduction to spectral clustering, this tutorial is very nice and much more complete than this article.
To start out, it is important to understand what the clustering problem is and why it is important. Clustering is an extremely useful tool in data analysis whereby a set of objects are organized into groups based on their similarity. We typically want to form groups so that objects in the same group are highly similar and objects in different groups are less similar. This is typically known as unsupervised learning because the data set is not labeled (differentiating it from other tasks like regression and classification).
I think the point that eigenvectors are “stable under noise” could be made earlier and without a lot of technical detail.
I would insert/change:
The changes to the rows and columns introduces “noise” into the example.
The eigenvectors are $(1\ 0\ 1\ 0)^T$ and $(0\ 1\ 0\ 1)^T$, with the first and third objects are in one cluster and the second and fourth are in the other. The eigenvectors make the correct identifications of the clusters, despite the introduction of noise.
Why eigenvectors perform well in the presence of noise is beyond the scope of this post.
Still, highly recommended as an introduction to spectral clustering.