C Reference Manual (D.M. Richie, 1974)

C Reference Manual (D.M. Richie, 1974)

I mention the C Reference Manual, now forty-three (43) years old, as encouragement to write good documentation.

It may have a longer life than you ever expected!

For example, in 1974 Richie writes:

2.2 Identifier (Names)

An identifier is a sequence of letters and digits: the first character must be alphabetic.

Which we find replicated years later in ISO/IEC 8879 : 1986 (SGML):

4.198 name: A name token whose first character is a name start character.

4.201 name start character: A character that can begin a name: letters and others designated by the concrete syntax.

And in production [53]:


name start character =
LC Letter \
UC Letter \
LCNMSTRT \
UCNMSTRT

Where Figure 1 of 9.2.1 SGML Character defines LC Letter as a-z, UC Letter as A-Z, LCNMSTRT as (none), UCNMSTRT as (none), in the concrete syntax.

And in 1997, the letter vs. digit distinction, finds its way into Extensible Markup Language (XML) 1.0.


[4] NameChar ::= Letter | Digit | ‘.’ | ‘-‘ | ‘_’ | ‘:’ | CombiningChar | Extender
[5] Name ::= (Letter | ‘_’ | ‘:’) (NameChar)*

“Letter” is a link to a production referencing all the qualifying Unicode characters which is too long to include here.

What started off as an arbitrary choice, “alphabetic” characters as name start characters in 1974, is picked up some 12 years later (1986) in ISO/IEC 8879 (SGML), both of which were bound by a restricted character set.

When the opportunity came to abandon the letter versus digit distinction in name start characters (XML 1.0), the result is a larger character repertoire for name start characters, but digits continue as second-class citizens.

Can you point to an explanation why Richie preferred alphabetic characters over digits for name start characters?

Comments are closed.