Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

October 19, 2012

Ngram Viewer 2.0 [String Usage != Semantic Usage]

Filed under: GoogleBooks,Natural Language Processing,Ngram Viewer — Patrick Durusau @ 3:32 pm

Ngram Viewer 2.0 by Jon Orwant.

From the post:

Since launching the Google Books Ngram Viewer, we’ve been overjoyed by the public reception. Co-creator Will Brockman and I hoped that the ability to track the usage of phrases across time would be of interest to professional linguists, historians, and bibliophiles. What we didn’t expect was its popularity among casual users. Since the launch in 2010, the Ngram Viewer has been used about 50 times every minute to explore how phrases have been used in books spanning the centuries. That’s over 45 million graphs created, each one a glimpse into the history of the written word. For instance, comparing flapper, hippie, and yuppie, you can see when each word peaked:

(graphic omitted)

Meanwhile, Google Books reached a milestone, having scanned 20 million books. That’s approximately one-seventh of all the books published since Gutenberg invented the printing press. We’ve updated the Ngram Viewer datasets to include a lot of those new books we’ve scanned, as well as improvements our engineers made in OCR and in hammering out inconsistencies between library and publisher metadata. (We’ve kept the old dataset around for scientists pursuing empirical, replicable language experiments such as the ones Jean-Baptiste Michel and Erez Lieberman Aiden conducted for our Science paper.)

Tracking the usage of phrases through time is no mean feat, but tracking their semantics would be far more useful.

For example, “freedom of speech” did not have the same “semantic” in the early history of the United States that it does today. Otherwise, how would you explain criminal statutes against blasphemy and their enforcement after the ratification of the US Constitution? (I have verified this but Wikipedia, Blasphemy Law in the United States, reports a person being jailed for blasphemy in the 1830’s.)

Or the guarantee of “freedom of speech,” in Article 125 of the 1936 Constitution of the USSR.

Those three usages, current United States, early United States, USSR 1936 (English translation), don’t have the same semantics to me.

You?

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress