Data: Continuous vs. Categorical by Robert Kosara.
From the post:
Data comes in a number of different types, which determine what kinds of mapping can be used for them. The most basic distinction is that between continuous (or quantitative) and categorical data, which has a profound impact on the types of visualizations that can be used.
The main distinction is quite simple, but it has a lot of important consequences. Quantitative data is data where the values can change continuously, and you cannot count the number of different values. Examples include weight, price, profits, counts, etc. Basically, anything you can measure or count is quantitative.
Categorical data, in contrast, is for those aspects of your data where you make a distinction between different groups, and where you typically can list a small number of categories. This includes product type, gender, age group, etc.
Both quantitative and categorical data have some finer distinctions, but I will ignore those for this posting. What is more important, is: why do those make a difference for visualization?
I like the use of visualization to reinforce the notion of difference between continuous and categorical data.
Makes me wonder about using visualization to explore the use of different data types for detecting subject sameness.
It may seem trivial to use the TMDM’s sameness of subject identifiers (simple string matching) to say two or more topics represent the same subject.
But what if subject identifiers match but other properties, say gender (modeled as an occurrence), do not?
Illustrating a mistake in the use of a subject identifier but also a weakness in reliance on a subject identitier (data type URI) for subject identity.
That data type relies only one string matching for identification purposes. Which may or may not agree with your subject sameness requirements.