How Statistics lifts the fog of war in Syria by David White.
From the post:
In a fascinating talk at Strata Santa Clara in February, HRDAG’s Director of Research Megan Price explained the statistical technique she used to make sense of the conflicting information. Each of the four agencies shown in the chart above published a list of identified victims. By painstakingly linking the records between the different agencies (no simple task, given incomplete information about each victim and variations in capturing names, ages etc.), HRDAG can get a more complete sense of the total number of casualties. But the real insight comes from recognizing that some victims were reported by no agency at all. By looking at the rates at which some known victims were not reported by all of the agencies, HRDAG can estimate the number of victims that were identified by nobody, and thereby get a more accurate count of total casualties. (The specific statistical technique used was Random Forests, using the R language. You can read more about the methodology here.)
Caution is always advisable with government issued data but especially so when it arises from an armed conflict.
A forerunner to topic maps, record linkage (which is still widely used), plays a central role in collating data recorded in various ways. It isn’t possible to collate heterogeneous data without creating a uniform set of records (record linkage) or by mapping the subjects of the original records together (topic maps).
The usual moniker, “big data” should really be: “big, homogeneous data (BHD). Which if that is what you have, works great. If that isn’t what you have, works less great. If at all.
BTW, groups like the Human Rights Data Analysis Group (HRDAG) would have far more credibility with me if their projects list didn’t read:
- Africa
- Asia
- Europe
- Middle East
- Central America
- South America
Do you notice anyone missing from that list?
I have always thought that “human rights” included cases of:
- sexual abuse
- chlid abuse
- violence
- discrimination
- and any number of similar issues
I can think of another place where those conditions exist in epidemic proportions.
Can’t you?