Intrinsic vs. Extrinsic Structure by Jesse Johnson.
From the post:
At this point, I think it will be useful to introduce an idea from geometry that is very helpful in pure mathematics, and that I find helpful for understanding the geometry of data sets. This idea is difference between the intrinsic structure of an object (such as a data set) and its extrinsic structure. Have you ever gone into a building, walked down a number of different halls and through different rooms, and when you finally got to where you’re going and looked out the window, you realized that you had no idea which direction you were facing, or which side of the building you were actually on? The intrinsic structure of a building has to do with how the rooms, halls and staircases connect up to each other. The extrinsic structure is how these rooms, halls and staircases sit with respect to the outside world. So, while you’re inside the building you may be very aware of the intrinsic structure, but completely lose track of the extrinsic structure.
You can see a similar distinction with subway maps, such as the famous London tube map. This map records how the different tube stops connect to each other, but it distorts how the stops sit within the city. In other words, the coordinates on the tube map do not represent the physical/GPS coordinates of the different stops. But while you’re riding a subway, the physical coordinates of the different stops are much less important than the inter-connectivity of the stations. In other words, the intrinsic structure of the subway is more important (while you’re riding it) than the extrinsic structure. On the other hand, if you were walking through a city, you would be more interested in the extrinsic structure of the city since, for example, that would tell you the distance in miles (or kilometers) between you and your destination.
Data sets also have both intrinsic and extrinsic structure, though there isn’t a sharp line between where the intrinsic structure ends and the extrinsic structure begins. These are more intuitive terms than precise definitions. In the figure below, which shows three two-dimensional data sets, the set on the left has an intrinsic structure very similar to that of the middle data set: Both have two blobs of data points connected by a narrow neck of data points. However, in the data set on the left the narrow neck forms a roughly straight line. In the center, the tube curves around, so that the entire set roughly follows a circle.
I am looking forward to this series of posts from Jesse.
Bearing in mind that the structure of a data set is impacted by collection goals, methods, and other factors.
Matters that are not (usually) represented in the data set per se.