Computational Literary Analysis by Atabey Kaygun.
From the post:
Description of the problem
One of the brilliant ideas that Kurt Vonnegut came up with was that one can track the plot of a literary work using graphical methods. He did that intuitively. Today’s question is “Can we track the plot changes in a text using computational or algorithmic methods?”
Overlaps between units of texts
The basic idea is to split a text into fundamental units (whether this is a sentence, or a paragraph depends on the document) and then convert each unit into a hash table where the keys are stemmed words within the unit, and the values are the number of times these (stemmed) words appear in each unit.
My hypothesis is (and I will test that in this experiment below) that the amount of overlap (the number of common words) between two consecutive units tells us how the plot is advancing.
I will take the fundamental unit as a sentence below.
…
Clojure, Lisp, computational literary analysis, what’s there not to like? 😉
Given the hypothesis:
the amount of overlap (the number of common words) between two consecutive units tells us how the plot is advancing
Atabey doesn’t say what his criteria is for “the plot advancing.” I mention that because both of the plots he offers trail off from their highs.
If there is a plot advance, shouldn’t the respective speeches build until they peak at the end?
Or is there some more complex “plot advancement” at play?
One of the things that makes this and similar analyses useful, particularly of well known speeches/works, is that we all “know” the high points. We have been conditioned to hear those as distinct, when the original hearers/readers were encountering it for the first time.
Such tools can pry us out of the rut of prior analysis. Sometimes.