Dean Conway writes in Data Science in the U.S. Intelligence Community [1] about modeling assumptions:
For example, it is common for an intelligence analyst to measure the relationship between two data sets as they pertain to some ongoing global event. Consider, therefore, in the recent case of the democratic revolution in Egypt that an analyst had been asked to determine the relationship between the volume of Twitter traffic related to the protests and the size of the crowds in Tahrir Square. Assuming the analyst had the data hacking skills to acquire the Twitter data, and some measure of crowd density in the square, the next step would be to decide how to model the relationship statistically.
One approach would be to use a simple linear regression to estimate how Tweets affect the number of protests, but would this be reasonable? Linear regression assumes an independent distribution of observations, which is violated by the nature of mining Twitter. Also, these events happen in both time (over the course of several hours) and space (the square), meaning there would be considerable time- and spatial-dependent bias in the sample. Understanding how modeling assumptions impact the interpretations of analytical results is critical to data science, and this is particularly true in the IC.
His central point that: Understanding how modeling assumptions impact the interpretations of analytical results is critical to data science, and this is particularly true in the IC. cannot be over emphasized.
The example of Twitter traffic reveals a deeper bias in the intelligence community, if it’s measurable, it’s meaningful.
No doubt Twitter facilitated communication within communities that already existed but that does not make it an enabling technology.
The revolution was made possible by community organizers working over decades (http://english.aljazeera.net/news/middleeast/2011/02/2011212152337359115.html) and trade unions (http://www.guardian.co.uk/commentisfree/2011/feb/10/trade-unions-egypt-tunisia).
And the revolution continued after Twitter and then cell phones were turned off.
Understanding such events requires investment in human intell and analysis, not over reliance on SIGINT. [2]
[1] Spring (2011) issue of I-Q-Tel’s quarterly journal, IQT Quarterly
[2] That a source is technical or has lights and bells, does not make it reliable or even useful.
PS: The Twitter traffic, such as it was, may have primarily been from: Twitter, I think, is being used by news media people with computer connections, through those kind of means. Facebook, Twitter, and the Middle East, IEEE Spectrum, Steve Cherry interviews Ben Zhao, expert on social networking performance.
Are we really interested in how news people use Twitter, even in a social movement context?