Alphabetical order explained in a mere 27,817 words by David Weinberger.
From the post:
This is one of the most amazing examples I’ve seen of the complexity of even simple organizational schemes. “Unicode Collation Algorithm (Unicode Technical Standard #10)” spells out in precise detail how to sort strings in what we might colloquially call “alphabetical order.” But it’s way, way, way more complex than that.
Unicode is an international standard for how strings of characters get represented within computing systems. For example, in the familiar ASCII encoding, the letter “A” is represented in computers by the number 65. But ASCII is too limited to encode the world’s alphabets. Unicode does the job.
As the paper says, “Collation is the general term for the process and function of determining the sorting order of strings of characters” so that, for example, users can look them up on a list. Alphabetical order is a simple form of collation.
The best part is the summary of Unicode Technical Standard #10:
This document dives resolutely into the brambles and does not give up. It incidentally exposes just how complicated even the simplest of sorting tasks is when looked at in their full context, where that context is history, language, culture, and the ambiguity in which they thrive.
We all learned the meaning of “alphabetical order” in elementary school. But which “alphabetical order” depends upon language, culture, context, etc.
Other terms and phrases have the same problem. But the vast majority of them have no Unicode Technical Report with all the possible meanings.
For those terms there are topic maps.
I first saw this in a tweet by Computer Science.