Enhancing Linguistic Search with the Google Books Ngram Viewer by Slav Petrov and Dipanjan Das.
From the post:
…
With our interns Jason Mann, Lu Yang, and David Zhang, we’ve added three new features. The first is wildcards: by putting an asterisk as a placeholder in your query, you can retrieve the ten most popular replacement. For instance, what noun most often follows “Queen” in English fiction? The answer is “Elizabeth”:…
Another feature we’ve added is the ability to search for inflections: different grammatical forms of the same word. (Inflections of the verb “eat” include “ate”, “eating”, “eats”, and “eaten”.) Here, we can see that the phrase “changing roles” has recently surged in popularity in English fiction, besting “change roles”, which earlier dethroned “changed roles”:
…
Finally, we’ve implemented the most common feature request from our users: the ability to search for multiple capitalization styles simultaneously. Until now, searching for common capitalizations of “Mother Earth” required using a plus sign to combine ngrams (e.g., “Mother Earth + mother Earth + mother earth”), but now the case-insensitive checkbox makes it easier:
…
The ngram data sets are available for download.
As of the date of this post, the data sets go up to 5-grams in multiple languages.
Be mindful of semantic drift, the changing of the meaning of words, over centuries or decades. Even across social, economic strata and work domains at the same time.