Using Metadata to Find Paul Revere by Kieran Healy.
From the post:
London, 1772.
I have been asked by my superiors to give a brief demonstration of the surprising effectiveness of even the simplest techniques of the new-fangled Social Networke Analysis in the pursuit of those who would seek to undermine the liberty enjoyed by His Majesty’s subjects. This is in connection with the discussion of the role of “metadata” in certain recent events and the assurances of various respectable parties that the government was merely “sifting through this so-called metadata” and that the “information acquired does not include the content of any communications”. I will show how we can use this “metadata” to find key persons involved in terrorist groups operating within the Colonies at the present time. I shall also endeavour to show how these methods work in what might be called a relational manner.
(…)
An extremely well-written and highly imaginative example of social network analysis.
With one flaw, a fatal one I’m afraid.
What is the first thing you notice about the data? The very first thing?
It’s clean!
Clean data is almost unknown in the real world.
Think about the last time you got into an argument with your credit card company. Or with the credit report bureau. Or with anyone else who collects data.
Dirty data is just a fact of life.
In a perfect world, the one software vendors/contractors imagine, yes, perfect matches come up every time. Particularly when mapping across data sets.
Because in their perfect world, Paul Revere is never P. Revere (or Revoire), or with varying birth dates December 21, 1734 (Old Style) or January 1, 1735 (modern calendar), who was a silversmith and/or a dentist. (Paul Revere)
To list just a few of the possible confusions.
Unfortunately, our leadership accepts uncritically claims based on data cleanliness that no practicing DBA has ever seen in raw data.
Considering the $billions at stake for agencies and contractors, their motives are clear.
What is puzzling is why our leadership doesn’t make that connection?
[…] may remember Kieran Healy’s post from Using Metadata to Find Paul Revere [In a Perfect World], where I pointed out that Kieran was using clean data. No omissions, no variant spellings, no […]
Pingback by How Important is Your Node in the Social Graph? « Another Word For It — August 13, 2013 @ 6:08 pm