How to Visualize and Compare Distributions by Nathan Yau.
Single data points from a large dataset can make it more relatable, but those individual numbers don’t mean much without something to compare to. That’s where distributions come in.
There are a lot of ways to show distributions, but for the purposes of this tutorial, I’m only going to cover the more traditional plot types like histograms and box plots. Otherwise, we could be here all night. Plus the basic distribution plots aren’t exactly well-used as it is.
Before you get into plotting in R though, you should know what I mean by distribution. It’s basically the spread of a dataset. For example, the median of a dataset is the half-way point. Half of the values are less than the median, and the other half are greater than. That’s only part of the picture.
What happens in between the maximum value and median? Do the values cluster towards the median and quickly increase? Are there are lot of values clustered towards the maximums and minimums with nothing in between? Sometimes the variation in a dataset is a lot more interesting than just mean or median. Distribution plots help you see what’s going on.
You will find distributions useful in many aspects of working with topic maps.
The most obvious use is the end-user display of data in a delivery situation. But distributions can also help you decide what areas of a data set look more “interesting” than others.
Nathan does his typically great job explaining distributions and you will learn a bit of R in the process. Not a bad evening at all.