Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

March 9, 2010

Schlepping From One Data Silo to Another (Part 1)

Filed under: Data Silos — Tags: , , , , — Patrick Durusau @ 6:59 pm

Talking about data silos is popular. Particularly with a tone of indignation, about someone else’s data silo. But, like the weather, everyone talks about data silos, but nobody does anything about them. In fact, if you look closely, all solutions to data silos, are (drum roll please!), more data silos.

To be sure, some data silos are more common than others but every data format is a data silo to someone who doesn’t use that format. Take the detailed records from the Federal Election Commission (FEC) on candidates, parties and other committees as an example. Important stuff for US residents interested in who bought access to their local representative or senator.

The tutorial on how to convert the files to MS Access clues you in that the files are in fixed width fields, or as the tutorial puts it: “Notice that a columns’ start value is the previous columns’ start value plus its’ width value (except for the first column, which is always “1”).” That sounds real familiar.

But, we return to the download page where we read about how to handle overpunch characters. Overpunch characters? Oh, as in COBOL. Now that’s an old data silo.

The point being that for all the talk about data silos we never escape them. Data formats are data silos. Get over it.

What we can do is make it possible to view information in one data silo as though it were held by another data silo. And if you must switch from one data silo to another, the time, cost and uncertainty of the migration can be reduced. (to be continued)

3 Comments

  1. How often do you write your blogs? I enjoy them a lot 7 5 6

    Comment by Jeffrey Dustin — March 10, 2010 @ 4:40 pm

  2. I am trying to do one post a day. There are a lot of articles and books that I think are relevant to topic maps that I will be covering here. Thanks!

    Comment by Patrick Durusau — March 10, 2010 @ 8:55 pm

  3. Wikipedia has a tolerable account of signed overpunches at http://en.wikipedia.org/wiki/Signed_overpunch (although the table is incorrect since the convention preceded EBCDIC and character sets were smaller without any “{” and “}”).

    These predate COBOL by a lot. The convention was from punched cards where there were 12 rows labeled X Y and 0 to 9. One hole in the 0 to 9 rows of a column would make a digit. To specify a sign on the end of a numeric field, the units position would have a punch in X or Y also.

    Tabulating equipment knew how to handle this, based on board wiring and other tricks.

    When converted to tape or read directly into a computer, the normal treatment was to input overpunches and the digits beneath as alphanumeric characters (there being no way to know which was intended without knowing the use of that particular column of the card). Software for handling the field as numeric would adjust for the units-“overpunched” alphanumeric as a digit with the arithmetic sign of the field glued onto it.

    It was more involved than just wanting to save space, since handling separate sign characters, especially leading ones, was too much work for tabulating equipment and for early computers too.

    This is not quite such an insidious economy as the one that got us to the Y2K problem. It is now just a curiosity.

    Comment by orcmid — March 13, 2010 @ 12:51 pm

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress