Szl – A Compiler and Runtime for the Sawzall Language
From the website:
Szl is a compiler and runtime for the Sawzall language. It includes support for statistical aggregation of values read or computed from the input. Google uses Sawzall to process log data generated by Google’s servers.
Since a Sawzall program processes one record of input at a time and does not preserve any state (values of variables) between records, it is well suited for execution as the map phase of a map-reduce. The library also includes support for the statistical aggregation that would be done in the reduce phase of a map-reduce.
The reading of one record at a time reminds me of the record linkage work that was developed in the late 1950’s in medical epidemiology.
Of course, there the records were converted into a uniform presentation, losing their original equivalents to column headers, etc. So the technique began with semantic loss.
I suppose you could say it was a lossy semantic integration technique.
Of course, that’s true for any semantic integration technique that doesn’t preserve the original language of a data set.
I will have to dig out some record linkage software to compare to Szl.