You made it back from your down time and your employer is glad to see you back.
“Some of the “reconciled” data we have been getting looks odd. Can you audit the data to make sure it is being reconciled correctly? Thanks!”
You remember that all you have are bare tokens.
AnHai Doan’s observation about after the fact mappings:
…the manual creation of semantic mappings has long been known to be extremely laborious and error-prone. For example, a recent project at the GTE telecommunications company sought to integrate 40 databases that have a total of 27,000 elements (i.e., attributes of relational tables) [LC00]. The project planners estimated that, without the database creators, just finding and documenting the semantic mappings among the elements would take more than 12 person years.
is ringing in your ears.
Mapping and creating sets of key/values has to be an augmented process, but the existence of sets of key/values pairs enables auditing of the “reconciled data.”
Sets of key/value pairs you don’t have.
*****
PS: Sets of key/value pairs = subject proxies, with rules for “reconciliation” to use Googleease.
To say “key/value pairs” does not presume any particular methodology for storage or processing. Pick one. Let usefulness be your guide.
[…] remember reading in Doan’s dissertation (see Auditable Reconciliation) that a schema reconciliation project would have taken 12 person years but for the original authors […]
Pingback by A thought on Hard vs Soft – Post (nonIdentification vs. multiIdentification?) « Another Word For It — February 20, 2011 @ 10:39 am