Practical advice for analysis of large, complex data sets [IC tl;dr]

Practical advice for analysis of large, complex data sets by Patrick Riley.

From the post:

For a number of years, I led the data science team for Google Search logs. We were often asked to make sense of confusing results, measure new phenomena from logged behavior, validate analyses done by others, and interpret metrics of user behavior. Some people seemed to be naturally good at doing this kind of high quality data analysis. These engineers and analysts were often described as “careful” and “methodical”. But what do those adjectives actually mean? What actions earn you these labels?

To answer those questions, I put together a document shared Google-wide which I optimistically and simply titled “Good Data Analysis.” To my surprise, this document has been read more than anything else I’ve done at Google over the last eleven years. Even four years after the last major update, I find that there are multiple Googlers with the document open any time I check.

Why has this document resonated with so many people over time? I think the main reason is that it’s full of specific actions to take, not just abstract ideals. I’ve seen many engineers and analysts pick up these habits and do high quality work with them. I’d like to share the contents of that document in this blog post.

Great post and should be read and re-read until it becomes second nature.

I wave off the intelligence community (IC) with tl;dr because intelligence conclusions are policy and not fact, artifacts.

The best data science practices in the world have no practical application in intelligence circles, unless they support the desired conclusions.

Rather than sully data science, intelligence communities should publish their conclusions and claim the evidence cannot be shared.

Before you leap to defend the intelligence community, recall their lying about mass surveillance of Americans, lying about weapons of mass destruction in Iraq, numerous lies about US activities in Vietnam (before 50K+ Americans and millions of Vietnamese were killed).

The question to ask about American intelligence community reports isn’t whether they are lies (they are), but rather why they are lying?

For those interested in data driven analysis, follow Riley’s advice.

Leave a Reply

You must be logged in to post a comment.