Semantics as Data by Oliver Kennedy.
From the post:
Something I’ve been getting drawn to more and more is the idea of computation as data.
This is one of the core precepts in PL and computation: any sort of computation can be encoded as data. Yet, this doesn’t fully capture the essence of what I’ve been seeing. Sure you can encode computation as data, but then what do you do with it? How do you make use of the fact that semantics can be encoded?
Let’s take this question from another perspective. In Databases, we’re used to imposing semantics on data. Data has meaning because we chose to give it meaning. The number 100,000 is meaningless, until I tell you that it’s the average salary of an employee at BigCorporateCo. Nevertheless, we can still ask questions in the abstract. Whatever semantics you use, 100,000 < 120,000. We can create abstractions (query languages) that allow us to ask questions about data, regardless of their semantics.
By comparison, an encoded computation carries its own semantics. This makes it harder to analyze, as the nature of those semantics is limited only by the type of encoding used to store the computation. But this doesn’t stop us from asking questions about the computation.
The Computation’s Effects
The simplest thing we can do is to ask a question about what it will compute. These questions span the range from the trivial to the typically intractable. For example, we can ask about…
- … what the computation will produce given a specific input, or a specific set of inputs.
- … what inputs will produce a given (range of) output(s).
- … whether a particular output is possible.
- … whether two computations are equivalent.
One particularly fun example in this space is Oracle’s Expression type [1]. An Expression stores (as a datatype) an arbitrary boolean expression with variables. The result of evaluating this expression on a given valuation of the variables can be injected into the WHERE clause of any SELECT statement. Notably, Expression objects can be indexed based on variable valuations. Given 3 such expressions: (A = 3), (A = 5), (A = 7), we can build an index to identify which expressions are satisfied for a particular valuation of A.
I find this beyond cool. Not only can Expression objects themselves be queried, it’s actually possible to build index structures to accelerate those queries.
Those familiar with probabilistic databases will note some convenient parallels between the expression type and Condition Columns used in C-Tables. Indeed, the concepts are almost identical. A C-Table encodes the semantics of the queries that went into its construction. When we compute a confidence in a C-Table (or row), what we’re effectively asking about is the fraction of the input space that the C-Table (row) produces an output for.
At every level of semantics there is semantic diversity.
Whether it is code or data, there are levels of semantics, each with semantic diversity.
You don’t have to resolve all semantic diversity, just enough to give you an advantage over others.