Cultural Fault Lines Determine How New Words Spread On Twitter, Say Computational Linguists
From the post:
A dialect is a particular form of language that is limited to a specific location or population group. Linguists are fascinated by these variations because they are determined both by geography and by demographics. So studying them can produce important insights into the nature of society and how different groups within it interact.
That’s why linguists are keen to understand how new words, abbreviations and usages spread on new forms of electronic communication, such as social media platforms. It is easy to imagine that the rapid spread of neologisms could one day lead to a single unified dialect of netspeak. An interesting question is whether there is any evidence that this is actually happening.
Today, we get a fascinating insight into this problem thanks to the work of Jacob Eisenstein at the Georgia Institute of Technology in Atlanta and a few pals. These guys have measured the spread of neologisms on Twitter and say they have clear evidence that online language is not converging at all. Indeed, they say that electronic dialects are just as common as ordinary ones and seem to reflect same fault lines in society.
…
Disappointment for those who thought the Net would help people overcome the curse of Babel.
When we move into new languages or means of communication, we simply take our linguistic diversity with us, like well traveled but familiar luggage.
If you think about it, the difficulties of multiple semantics for OWL same:As is another instance of the same phenomena. Semantically distinct groups assigned the same token, OWL same:As different semantics. That should not have been a surprise. But it was and it will be every time on community privileges itself to be the giver of meaning for any term.
If you want to see the background for the post in full:
Diffusion of Lexical Change in Social Media by Jacob Eisenstein, Brendan O’Connor, Noah A. Smith, Eric P. Xing.
Abstract:
Computer-mediated communication is driving fundamental changes in the nature of written language. We investigate these changes by statistical analysis of a dataset comprising 107 million Twitter messages (authored by 2.7 million unique user accounts). Using a latent vector autoregressive model to aggregate across thousands of words, we identify high-level patterns in diffusion of linguistic change over the United States. Our model is robust to unpredictable changes in Twitter’s sampling rate, and provides a probabilistic characterization of the relationship of macro-scale linguistic influence to a set of demographic and geographic predictors. The results of this analysis offer support for prior arguments that focus on geographical proximity and population size. However, demographic similarity — especially with regard to race — plays an even more central role, as cities with similar racial demographics are far more likely to share linguistic influence. Rather than moving towards a single unified “netspeak” dialect, language evolution in computer-mediated communication reproduces existing fault lines in spoken American English.