Python NLTK/Neo4j: Analysing the transcripts of How I Met Your Mother by Mark Needham.
From the post:
After reading Emil’s blog post about dark data a few weeks ago I became intrigued about trying to find some structure in free text data and I thought How I met your mother’s transcripts would be a good place to start.
I found a website which has the transcripts for all the episodes and then having manually downloaded the two pages which listed all the episodes, wrote a script to grab each of the transcripts so I could use them on my machine.
…
Interesting intermarriage between NLTK and Neo4j. Perhaps even more so if NLTK were used to extract information from dialogue outside of fictional worlds and Neo4j was used to model dialogue roles, etc., as well as relationships and events outside of the dialogue.
Congressional hearings (in the U.S., same type of proceedings outside the U.S.) would make an interesting target for analysis using NLTK and Neo4j.