Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

December 29, 2013

Sanity Checks

Filed under: Data,Data Analysis — Patrick Durusau @ 3:11 pm

Being paranoid about data accuracy! by Kunal Jain.

Kunal knew a long meeting was developing after this exchange at its beginning:

Kunal: How many rows do you have in the data set?

Analyst 1: (After going through the data set) X rows

Kunal: How many rows do you expect?

Analyst 1 & 2: Blank look at their faces

Kunal: How many events / data points do you expect in the period / every month?

Analyst 1 & 2: …. (None of them had a clue)
The number of rows in the data set looked higher to me. The analysts had missed it clearly, because they did not benchmark it against business expectation (or did not have it in the first place). On digging deeper, we found that some events had multiple rows in the data sets and hence the higher number of rows.
….

You have probably seen them before but Kunal has seven (7) sanity check rules that should be applied to every data set.

Unless, of course, the inability to answer to simple questions about your data sets* is tolerated by your employer.

*Data sets become “yours” when you are asked to analyze them. Better to spot and report problems before they become evident in your results.

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress