Big Data on the Big Data Conversation: Tracking the NSA Story by Nicholas Hartman.
From the post:
Recent revelations regarding the National Security Agency’s (NSA) extensive data interception and monitoring practices (aka PRISM) have brought a branch of “Big Data’s” research into the broader public light. The basic premise of such work is that computer algorithms can study vast quantities of digitized communication interactions to identify potential activities and persons of interest for national security purposes.
A few days ago we wondered what could be found by applying such Big Data monitoring of communications to track the conversational impact of the NSA story on broader discussions about Big Data. This brief technical note highlights some of our most basic findings.
Our communication analytics work is usually directed at process optimization and risk management. However, in this case we applied some of the most basic components of our analytics tools towards public social media conversations—specifically tweets collected via Twitter’s streaming API. Starting last summer, we devoted a small portion of our overall analytical compute resources towards monitoring news and social media sites for evolving trends within various sectors including technology. A subset of this data containing tweets on topics related to Big Data is analyzed below.
Interesting analysis of big data (communications) relative to the NSA’s PRISM project.
While spotting trends in data is important, have you noticed that leaks are acts by single individuals?
If there is a pattern in mass data that precedes leaking, it hasn’t been very effective at stopping leaks.
Rather than crunching mass data, shouldn’t pattern matching be applied to the 1,409,969 Americans who hold top secret security clearance? (Link is to the 2012 Report on Security Clearance Determinations.)
Targeted data crunching as opposed to random data crunching.