What kinds of metadata are important anyway? by Curt Monash.
From the post:
In today’s post about HCatalog, I noted that the Hadoop/HCatalog community didn’t necessarily understand all the kinds of metadata that enterprises need and want, especially in the context of data integration and ETL and ELT (Extract/Transform/Load/Transform). That raises a natural question — what kinds of metadata do users need or want? In the hope of spurring discussion, from vendors and users alike, I’m splitting this question out into a separate post.
Please comment with your thoughts about ETL-related metadata needs. The conversation needs to advance.
In the relational world, there are at least three kinds of metadata:
- Definitional information about data structures, without which you can’t have a relational database at all. That area seems binary; either you have enough to make sense of your data or you don’t.
- Statistics about columns and tables, such as the most frequent values and how often they occur, which are kept for the purpose of optimization. Those seem to be nice-to-haves more than must-haves. The more information of this kind you have, the more chances you have to save resources.
- Historical and security information about data. This is where things get really complicated. It’s also where Hadoop is still in the “So what exactly should we build?” stage of design.
I would assume that data structures are meant to carry information, possibly even identification, of one or more subjects.
Seems odd to me that what subjects are meant to be identified, much less what identifies those subjects, goes unmentioned in Carl’s post.
Not mentioning subjects and their identifications works, obviously, because sys admins migrate data year in and year out, more or less successfully, at a fairly high cost, but it works.
If we knew what subjects whose data was being stored and how they were identified, migration to other systems would be less uncertain and less costly.
Oh, that reminds me, have you decided what “key finding” Oracle left out of its summary? It comes up here as well. Monday is only a day or so away.