Why Data Lineage is Your Secret Data Quality Weapon by Dylan Jones.
From the post:
Data lineage means many things to many people but it essentially refers to provenance – how do you prove where your data comes from?
It’s really a simple exercise. Just pull an imaginary string of data from where the information presents itself, back through the labyrinth of data stores and processing chains, until you can go no further.
I’m constantly amazed by why so few organisations practice sound data lineage management despite having fairly mature data quality or even data governance programs. On a side note, if ever there was a justification for the importance of data lineage management then just take a look at the brand damage caused by the recent European horse meat scandal.
But I digress. Why is data lineage your secret data quality weapon?
The simple answer is that data lineage forces your organisation to address two big issues that become all too apparent:
- Lack of ownership
- Lack of formal information chain design
Or to put it into a topic map context, can you trace what topics merged to create the topic you are now viewing?
And if you can’t trace, how can you audit the merging of topics?
And if you can’t audit, how do you determine the reliability of your topic map?
That is reliability in terms of date (freshness), source (reliable or not), evaluation (by screeners), comparison (to other sources), etc.
Same questions apply to all data aggregation systems.
Or as Mrs. Weasley tells Ginny:
“Never trust anything that can think for itself if you can’t see where it keeps its brain.”
Correction: Wesley -> Weasley. We had a minister friend over Sunday and were discussing the former, not the latter. 😉