Error Handling, Validation and Cleansing with Semantic Types and Mappings by Michael Tarallo.
From the post:
expressor ETL applications can setup data validation rules and error handling in a few ways. The traditional approach with many ETL tools is to build in the rules using the various ETL operators. A more streamlined approached is to also use the power of expressor Semantic Mappings and Semantic Types.
- Semantic Mappings specify how a variety of characteristics are to be handled when string, number, and date-time data types are mapped from the physical schema (your source) to the logical semantic layer known as the Semantic Type.
- Semantic Types allow you to define, in business terms, how you want the data and the data model to be represented.
The use of these methods both provide a means of attribute data validation and invoking corrective actions if rules are violated.
- Data Validation rules can be in the form of pattern matching, value ranges, character lengths, formatting, currency and other specific data type constraints.
- Corrective Actions can be in the form of null, default and correction value replacements as well as specific operator handling to either skip records or reject them to another operator.
NOTE: Semantic Mapping rules are applied first before Semantic Type rules.
Read more here:
I am still trying to find time to test at least the community edition of the software.
What “extra” time I have now is being soaked up configuring/patching Eclipse to build Nutch, to correct a known problem between Nutch and Solr. I suspect you could sell a packaged version of open source software that has all the paths and classes hard coded into the software. No more setting paths, having inconsistent library versions, etc. Just unpack and run. Store data in separate directory. New version comes out, simply rm – R on the install directory and unpack the new one. That should also include the “.” files. Configuration/patching isn’t a good use of anyone’s time. (Unless you want to sell the results. 😉 )
But I will get to it! Unless someone beats me to it and wants to send me a link to their post that I can cite and credit on my blog.
Two things I would change about Michael’s blog:
Prerequisite: Knowledge of expressor Studio and dataflows. You can find tutorials and documentation here
To read:
Prerequisites:
- expressor software (community or 30-day free trial) here.
- Knowledge of expressor Studio and dataflows. You can find tutorials and documentation here
And, well, not Michael’s blog but on the expressor download page, if the desktop/community edition is “expressor Studio” then call it that on the download page.
Don’t use different names for a software package and expect users to sort it out. Not if you want to encourage downloads and sales anyway. Surveys show you have to wait until they are paying customers to start abusing them. 😉