Ngram Viewer 2.0 [String Usage != Semantic Usage]

Friday, October 19th, 2012

Ngram Viewer 2.0 by Jon Orwant.

Since launching the Google Books Ngram Viewer, we’ve been overjoyed by the public reception. Co-creator Will Brockman and I hoped that the ability to track the usage of phrases across time would be of interest to professional linguists, historians, and bibliophiles. What we didn’t expect was its popularity among casual users. Since the launch in 2010, the Ngram Viewer has been used about 50 times every minute to explore how phrases have been used in books spanning the centuries. That’s over 45 million graphs created, each one a glimpse into the history of the written word. For instance, comparing flapper, hippie, and yuppie, you can see when each word peaked:

Meanwhile, Google Books reached a milestone, having scanned 20 million books. That’s approximately one-seventh of all the books published since Gutenberg invented the printing press. We’ve updated the Ngram Viewer datasets to include a lot of those new books we’ve scanned, as well as improvements our engineers made in OCR and in hammering out inconsistencies between library and publisher metadata. (We’ve kept the old dataset around for scientists pursuing empirical, replicable language experiments such as the ones Jean-Baptiste Michel and Erez Lieberman Aiden conducted for our Science paper.)

Tracking the usage of phrases through time is no mean feat, but tracking their semantics would be far more useful.

For example, “freedom of speech” did not have the same “semantic” in the early history of the United States that it does today. Otherwise, how would you explain criminal statutes against blasphemy and their enforcement after the ratification of the US Constitution? (I have verified this but Wikipedia, Blasphemy Law in the United States, reports a person being jailed for blasphemy in the 1830’s.)

Or the guarantee of “freedom of speech,” in Article 125 of the 1936 Constitution of the USSR.

Those three usages, current United States, early United States, USSR 1936 (English translation), don’t have the same semantics to me.


search Google Books by ISSN

Wednesday, November 16th, 2011

Turns out Google Books does support searching by ISSN, using ordinary fielded search syntax, although I don’t believe it’s documented anywhere.

Mostly what you’ll find is digitized bound journals from libraries (that is, digitization of some volumes of the journal, probably not all of them, which may or may not have full text access). Sometimes things that physically look like monographs but are published serially also get ISSNs, you might get some of those too, not sure. Has to be in GBS, and GBS has to have ISSN metadata for the record, not sure how often that happens.

Of particular interest to library students and librarians.

My only caution is that like many “undocumented” features, this may or may not persist. Still, take advantage of it while it is around.

