Topic Modeling Sarah Palin’s Emails from Edwin Chen.
From the post:
LDA-based Email Browser
Earlier this month, several thousand emails from Sarah Palin’s time as governor of Alaska were released. The emails weren’t organized in any fashion, though, so to make them easier to browse, I did some topic modeling (in particular, using latent Dirichlet allocation) to separate the documents into different groups.
Interesting analysis and promise of more to follow.
With a US presidential election next year, there is little doubt there will be friendly as well as hostile floods of documents.
Time to sharpen your data extraction tools.