Visualizing Lexical Novelty in Literature by Matthew Hurst.
From the post:
Novels are full of new characters, new locations and new expressions. The discourse between characters involves new ideas being exchanged. We can get a hint of this by tracking the introduction of new terms in a novel. In the below visualizations (in which each column represents a chapter and each small block a paragraph of text), I maintain a variable which represents novelty. When a paragraph contains more than 25% new terms (i.e. words that have not been observed thus far) this variable is set at its maximum of 1. Otherwise, the variable decays. The variable is used to colour the paragraph with red being 1.0 and blue being 0. The result is that we can get an idea of the introduction of new ideas in novels.
As aways, interesting ideas on text visualization from Matthew Hurst.
Curious how much novelty (change?) would you see between SEC filings from the same law firm? Or put another way, how much boilerplate is there in regulatory filings? I am mindful of the disaster plan for BP that included saving polar bears in the Gulf of Mexico.
Another interesting tool for exploring data and data sets in preparation to create topic maps.