Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

December 12, 2010

Szl – A Compiler and Runtime for the Sawzall Language

Filed under: Data Mining,Software — Patrick Durusau @ 5:52 pm

Szl – A Compiler and Runtime for the Sawzall Language

From the website:

Szl is a compiler and runtime for the Sawzall language. It includes support for statistical aggregation of values read or computed from the input. Google uses Sawzall to process log data generated by Google’s servers.

Since a Sawzall program processes one record of input at a time and does not preserve any state (values of variables) between records, it is well suited for execution as the map phase of a map-reduce. The library also includes support for the statistical aggregation that would be done in the reduce phase of a map-reduce.

The reading of one record at a time reminds me of the record linkage work that was developed in the late 1950’s in medical epidemiology.

Of course, there the records were converted into a uniform presentation, losing their original equivalents to column headers, etc. So the technique began with semantic loss.

I suppose you could say it was a lossy semantic integration technique.

Of course, that’s true for any semantic integration technique that doesn’t preserve the original language of a data set.

I will have to dig out some record linkage software to compare to Szl.

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress