ETL Demo with Data From Data.Gov by Kevin E. Kline.
From the post:
A little over a month ago, I wrote an article (Is There Such a Thing as Easy ETL) about expressor software and their desktop ETL application, expressor Studio. I wrote about how it seemed much easier than the native ETL tools in SQL Server when I was reading up on the tool, but that the “proof would be in the pudding” so to speak when I actually tried it out loading some free (and incredibly useful) data from the US federal data clearinghouse, Data.Gov.
If you’d rather not read my entire previous article – quick recap, expressor Studio uses “semantic types” to manage and abstract mappings between sources and targets. In essence, these types are used for describing data in terms that humans can understand—instead of describing data in terms that computers can understand. The idea of semantic abstraction is quite intriguing and it gave me an excuse to use data from data.gov to build a quick demo. You can download the complete data set I used from the following location: International Statistics. (Note: I have this dream that I’m going to someday download all of this free statistical data sets, build a bunch of amazing and high-value analytics, and make a mint. If, instead, YOU do all of those things, then please pay to send at least one of my seven kids to college in repayment for the inspiration. I’m not kidding. I have SEVEN kids. God help me).
The federal government, to their credit, has made great progress in making data available. However, there is a big difference between accessing data and understanding data. When I first looked at one of the data files I downloaded, I figured it was going to take me years to decrypt the field names. Luckily, I did notice an Excel file with field names and descriptions. Seriously, there are single letter field names in these files where the field name “G” has a description of “Age group indicator” (Oh Wow). See the figure below.
I like Kevin’s point about the difference between “accessing data and understanding data.”