Gmail Email analysis with Neo4j – and spreadsheets by Rik Van Bruggen.
From the post:
A bunch of different graphistas have pointed out to me in recent months that there is something funny about Graphs and email. Specifically, about graphs and email analysis. From my work in previous years at security companies, I know that Email Forensics is actually big business. Figuring out who emails whom, about what topics, with what frequency, at what times – is important. Especially when the proverbial sh*t hits the fan and fraud comes to light – like in the Enron case. How do I get insight into email traffic? How do I know what was communicated to who? And how do I get that insight, without spending a true fortune?
An important demonstration that sophisticated data analysis may originate with fairly pedestrian authoring tools.
For the Enron emails, see: Enron Email Dataset. Reported to be 0.5M messages, approximately 423Mb, tarred and gzipped.
The topic map question is what to do with separate graphs of:
- Enron emails,
- Enron corporate structure,
- Social relationships between Enron employees and others,
- Documents of other types interchanged or read inside of Enron,
- Travel and expense records, and,
- Phone logs inside Enron?
Graphs of any single data set can be interesting.
Merging graphs of inter-related data sets can be powerful.