Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

June 7, 2013

NSA…Verizon…Obama…
Connecting the Dots. Or not.

Why Verizon?

The first question that came to mind when the Guardian broke the NSA-Verizon news.

Here’s why I ask:

Verizon market share

(source: http://www.statista.com/statistics/199359/market-share-of-wireless-carriers-in-the-us-by-subscriptions/)

Verizon over 2011-2012 had only 34% of the cell phone market.

Unless terrorists prefer Verizon for ideological reasons, why Verizon?

Choosing only Verizon means the NSA is missing 66% of potential terrorist cell traffic.

That sounds like a bad plan.

What other reason could there be for picking Verizon?

Consider some other known players:

President Barack Obama, candidate for President of the United States, 2012.

“Bundlers” who gathered donations for Barack Obama:

Min Max Name City State Employer
$200,000 $500,000 Hill, David Silver Spring MD Verizon Communications
$200,000 $500,000 Brown, Kathryn Oakton VA Verizon Communications
$50,000 $100,000 Milch, Randal Bethesda MD Verizon Communications

(Source: OpenSecrets.org – 2012 Presidential – Bundlers)

BTW, the Max category means more money may have been given, but that is the top reporting category.

I have informally “identified” the bundlers as follows:

  • Kathryn C. Brown

    Kathryn C. Brown is senior vice president – Public Policy Development and Corporate Responsibility. She has been with the company since June 2002. She is responsible for policy development and issues management, public policy messaging, strategic alliances and public affairs programs, including Verizon Reads.

    Ms. Brown is also responsible for federal, state and international public policy development and international government relations for Verizon. In that role she develops public policy positions and is responsible for project management on emerging domestic and international issues. She also manages relations with think tanks as well as consumer, industry and trade groups important to the public policy process.

  • David A. Hill, Bloomberg Business Week reports: David A. Hill serves as Director of Verizon Maryland Inc.

    LinkedIn profile reports David A. Hill worked for Verizon, VP & General Counsel (2000 – 2006), Associate General Counsel (March 2006 – 2009), Vice President & Associate General Counsel (March 2009 – September 2011) “Served as a liaison between Verizon and the Obama Administration”

  • Randal S. Milch Executive Vice President – Public Policy and General Counsel

What is Verizon making for each data delivery? Is this cash for cash given?

If someone gave your more than $1 million (how much more is unknown), would you talk to them about such a court order?

If you read the “secret” court order, you will notice it was signed on April 23, 2013.

There isn’t a Kathryn C. Brown in Oakton in the White House visitor’s log, but I did find this record, where a “Kathryn C. Brown” made an appointment at the Whitehouse and was seen two (2) days later on the 17th of January 2013.

BROWN,KATHRYN,C,U69535,,VA,,,,,1/15/13 0:00,1/17/13 9:30,1/17/13 23:59,,176,CM,WIN,1/15/13 11:27,CM,,POTUS/FLOTUS,WH,State Floo,MCNAMARALAWDER,CLAUDIA,,,04/26/2013

I don’t have all the dots connected because I am lacking some unknown # of the players, internal Verizon communications, Verizon accounting records showing government payments, but it is enough to make you wonder about the purpose of the “secret” court order.

Was it a serious attempt at gathering data for national security reasons?

Or was it gathering data as a pretext for payments to Verizon or other contractors?

My vote goes for “pretext for payments.”

I say that because using data from different sources has always been hard.

In fact, about 60 to 80% of the time of a data analyst is spent “cleaning up data” for further processing.

The phrase “cleaning up data” is the colloquial form of “semantic impedance.”

Semantic impedance happens when the same people are known by different names in different data sets or different people are known by the same names in the same or different data sets.

Remember Kathryn Brown, of Oakton, VA? One of the Obama bundlers. Let’s use her as an example of “semantic impedance.”

The FEC has a record for Kathryn Brown of Oakton, VA.

But a search engine found:

Kathryn C. Brown

Same person? Or different?

I found another Kathryn Brown at Facebook:

And an image of Facebook Kathryn Brown:

Kathryn Brown, Facebook

And a photo from a vacation she took:

Bangkok

Not to mention the Kathryn Brown that I found at Twitter.

Kathryn Brown, Twitter

That’s only four (4) data sources and I have at least four (4) different Kathryn Browns.

Across the United States, a quick search shows 227,000 Kathryn Browns.

Remember that is just a personal name. What about different forms of addresses? Or names of employers? Or job descriptions? Or simple errors, like the 20% error rate in credit report records.

Take all the phones, plus names, addresses, employers, job descriptions, errors + other data and multiply that times 311.6 million Americans.

Can that problem be solved with petabytes of data and teraflops of processing?

Not a chance.

Remember that my identification of Kathryn “bundler” Brown with the Kathryn C. Brown of Verison was a human judgement, not an automatic rule. Nor would a computer think to check the White House visitor logs to see if another, possibly the same Kathryn C. Brown visited the White House before the secret order was signed.

Human judgement is required because all the data that the NSA has been collecting is “dirty” data, from one perspective or other. Either is is truly “dirty” in the sense of having errors or it is “dirty” in the sense it doesn’t play well with other data.

The Orwellian fearists can stop huffing and puffing about the coming eclipse of civil liberties. Those passed from view a short time after 9/11 with the passage of the Patriot Act.

That wasn’t the fault of ineffectual NSA data collection. American voters bear responsibility for the loss of civil liberties not voting leadership into office that would repeal the Patriot Act.

Ineffectual NSA data collection impedes the development of techniques that for a sanely scoped data collection effort could make a difference.

A sane scope for preventing terrorist attacks could be starting with a set of known or suspected terrorist phone numbers. Using all phone data (not just from Obama contributors), only numbers contacting or being contacted by those numbers would be subject to further analysis.

Using that much smaller set of phone numbers as identifiers, we could then collect other data, such as names and addresses associated with that smaller set of phone numbers. That doesn’t make the data any cleaner but it does give us a starting point for mapping “dirty” data sets into our starter set.

The next step would be create mappings from other data sets. If we say why we have created a mapping, others can evaluate the accuracy of our mappings.

Those tasks would require computer assistance, but they ultimately would be matters of human judgement.

Examples of such judgements exist, say for example in Palantir product line. If you watch Palantir Gotham being used to model biological relationships, take note of the results that were tagged by another analyst. And how the presenter tags additional material that becomes available to other researchers.

Computer assisted? Yes. Computer driven? No.

To be fair, human judgement is also involved in ineffectual NSA data collection efforts.

But it is human judgement that rewards sycophants and supporters, not serving the public interest.

4 Comments

  1. […] By Patrick Durusau, who consults on semantic integration and edits standards. Durusau is convener of JTC 1 SC 34/WG 3, co-editor of 13250-1 and 13250-5 (Topic Maps Introduction and Reference Model, respectively), and editor of the OpenDocument Format (ODF) standard at OASIS and ISO (ISO/IEC 26300). Originally published at Another Word for It. […]

    Pingback by Could the Verizon-NSA Metadata Collection Be a Stealth Political Kickback? « naked capitalism — June 8, 2013 @ 3:56 am

  2. […] By Patrick Durusau, who consults on semantic integration and edits standards. Durusau is convener of JTC 1 SC 34/WG 3, co-editor of 13250-1 and 13250-5 (Topic Maps Introduction and Reference Model, respectively), and editor of the OpenDocument Format (ODF) standard at OASIS and ISO (ISO/IEC 26300). Originally published atAnother Word for It. […]

    Pingback by Could the Verizon-NSA Metadata Collection Be a Stealth Political Kickback? | Fifth Estate — June 8, 2013 @ 10:30 am

  3. […] By Patrick Durusau, who consults on semantic integration and edits standards. Durusau is convener of JTC 1 SC 34/WG 3, co-editor of 13250-1 and 13250-5 (Topic Maps Introduction and Reference Model, respectively), and editor of the OpenDocument Format (ODF) standard at OASIS and ISO (ISO/IEC 26300). Originally published at Another Word for It. […]

    Pingback by Patrick Durusau: Social Security Numbers – Close Enough for a Drone Strike? « naked capitalism — June 11, 2013 @ 4:04 am

  4. […] hazards, difficulties and dangers of name matching in large data pools was explored in NSA…Verizon…Obama…Connecting the Dots. Or not., republished at Naked Capitalism as: Could the Verizon-NSA Metadata Collection Be a Stealth […]

    Pingback by SSNs: Close Enough for a Drone Strike? « Another Word For It — June 11, 2013 @ 8:28 am

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress