Jane, John … Leslie? A Historical Method for Algorithmic Gender Prediction by Cameron Blevins and Lincoln Mullen.
Abstract:
This article describes a new method for inferring the gender of personal names using large historical datasets. In contrast to existing methods of gender prediction that treat names as if they are timelessly associated with one gender, this method uses a historical approach that takes into account how naming practices change over time. It uses historical data to measure the likelihood that a name was associated with a particular gender based on the time or place under study. This approach generates more accurate results for sources that encompass changing periods of time, providing digital humanities scholars with a tool to estimate the gender of names across large textual collections. The article first describes the methodology as implemented in the gender package for the R programming language. It goes on to apply the method to a case study in which we examine gender and gatekeeping in the American historical profession over the past half-century. The gender package illustrates the importance of incorporating historical approaches into computer science and related fields.
An excellent introduction to the gender
package for R, historical grounding of the detection of gender by name, with the highlight of the article being the application of this technique to professional literature in American history.
It isn’t uncommon to find statistical techniques applied to texts whose authors and editors are beyond the reach of any critic or criticism.
It is less than common to find statistical techniques applied to extant members of a profession.
Kudos to both Blevins and Mullen for refinement the detection of gender and for applying that refinement publishing in American history.