This isn’t a new idea, but it occurred to me that introducing readers to “dimensions of subject identification” might be an easier on ramp for topic maps. It enables us to dodge the sticky issues of “identity,” in favor of asking what do you want to talk about? and how many do you want/need to identify it?
To start with a classic example, if we only have one dimension and the string “Paris,” ambiguity is destined to follow.
If we add a country dimension, now having two dimensions, “Paris” + “France” can be distinguished from all other uses of “Paris” with the string + country dimension.
The string + country dimension fares less well for “Paris” + country = “United States:”
- Paris, Arkansas, a city
- Paris, Idaho, a city
- Paris, Illinois, a city
- Paris, Indiana, an unincorporated community
- Paris, Iowa, an unincorporated community
- Paris, Kentucky, a city
- Paris, Maine, a town
- Paris, an unincorporated community in Green Charter Township, Michigan
- Paris, Mississippi, an unincorporated community
- Paris, Missouri, a city
- Paris, New Hampshire, an unincorporated community
- Paris, New York, a town
- Paris, Portage County, Ohio, an unincorporated community
- Paris, Stark County, Ohio, an unincorporated community
- Paris, Oregon, an unincorporated community
- Paris, Pennsylvania, a census-designated place
- Paris, Tennessee, a city
- Paris, Texas, a city
- Paris, Virginia, an unincorporated community
- Paris, Wisconsin (disambiguation), several Wisconsin localities
- Paris Township (disambiguation), several US localities
- Beresford, South Dakota, a city formerly called Paris
- Loraine, California, an unincorporated community formerly called Paris
- Paris Mountain, South Carolina – see Paris Mountain State Park
- Paris Mountain, Virginia
For the United States you need “Paris” + country + state dimensions, at a minimum, but that leaves you with two instances of Paris in Ohio.
One advantage of speaking of “dimensions of subject identification” is that we can order systems of subject identification by the number of dimensions they offer. Not to mention examining the consequences of the choices of dimensions.
One dimensional systems, that is a solitary string, "Paris,"
as we said above, leave users with no means to distinguish one use from another. They are useful and common in CSV files or database tables, but risk ambiguity and being difficult to communicate accurately to others.
Two dimensional systems, that is city = "Paris,"
enables users to distinguish usages other than for city
, but as you can see from the Paris example in the U.S., that may not be sufficient.
Moreover, city
itself may be a subject identified by multiple dimensions, as different governmental bodies define “city” differently.
Just as some information systems only use one dimensional strings for headers, other information systems may use one dimensional strings for the subject city
in city = "Paris."
But all systems can capture multiple dimensions of identification for any subjects, separate from those systems.
Perhaps the most useful aspect of dimensions of identification is enabling user to ask their information architects what dimensions and their values serve to identify subjects in information systems.
Such as the headers in database tables or spreadsheets. 😉