Machine Learning in R: Clustering by Ricky Ho.
Clustering is a very common technique in unsupervised machine learning to discover groups of data that are “close-by” to each other. It is broadly used in customer segmentation and outlier detection.
It is based on some notion of “distance” (the inverse of similarity) between data points and use that to identify data points that are close-by to each other. In the following, we discuss some very basic algorithms to come up with clusters, and use R as examples.
Covers K-Means, Hierarchical Clustering, Fuzzy C-Means, Multi-Gaussian with Expectation-Maximization, and Density-based Cluster algorithms.
Good introduction to the basics of clustering in R.