Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

December 17, 2010

Google Books Ngram Viewer

Filed under: Dataset,Software — Patrick Durusau @ 4:33 pm

Google Books Ngram Viewer

From the website:

Scholars interested in topics such as philosophy, religion, politics, art and language have employed qualitative approaches such as literary and critical analysis with great success. As more of the world’s literature becomes available online, it’s increasingly possible to apply quantitative methods to complement that research. So today Will Brockman and I are happy to announce a new visualization tool called the Google Books Ngram Viewer, available on Google Labs. We’re also making the datasets backing the Ngram Viewer, produced by Matthew Gray and intern Yuan K. Shen, freely downloadable so that scholars will be able to create replicable experiments in the style of traditional scientific discovery.

Since 2004, Google has digitized more than 15 million books worldwide. The datasets we’re making available today to further humanities research are based on a subset of that corpus, weighing in at 500 billion words from 5.2 million books in Chinese, English, French, German, Russian, and Spanish. The datasets contain phrases of up to five words with counts of how often they occurred in each year.

Tracing shifts in language usage will help topic map designers create maps for historical materials that require less correction by users.

One wonders if the extracts can be traced back to particular works?

That would enable a map developed for these extracts to be used with the scanned texts themselves.

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress