LANGSEC: Taming the Weird Machines by Jacob Torrey.
From the post:
Introduction
I want to get some of my opinions on the current state of computer security out there, but first I want to highlight some of the most exciting, and in my views, promising recent developments in security: language-theoretic security (LangSec). Feel free to skip the next few paragraphs of background if you are familiar with the concepts to get to my analysis, otherwise, buckle up for a little ride!
Background
If I were to distill the core of the LangSec movement into a single thesis it would be this: The complexity of our computing systems (both software and hardware) have reached such a degree that data must treated as formally as code. A concrete example of this is return-oriented programming (ROP), where instead of executing shellcode loaded into memory by the attacker, a number of gadgets are found in existing code (such as libc) and their addresses chained together on the stack and as the ret
instruction is repeatedly called, the semantics of the gadgets is executed. This hybrid execution environment of using existing code and driving it with a buffer-overflow of data is one example of a weird machine.
Such weird machines crop up in many sorts of places: viz. the Intel x86 MMU that has been shown to be Turing-complete, the meta-data of ELF executable files that can drive execution in the loading & dynamic-linking stage, etc… This highlights the fact that data can be treated as instructions or code on these weird machines, much like Java byte-code is data to an x86 CPU, it is interpreted as code by the JVM. The JVM is a formal, explicit machine, much like the x86 CPU; weird machines on the other hand are ad hoc, implicit and generally not intentionally created. Many exploits are simply shellcode developed for a weird machine instead of the native CPU.
…
The “…data must be formally treated as code…” caught my eye as the reverse of “…code-as-data…,” which is a characteristic of Lisp and Clojure.
From a topic map/subject identity perspective, the problem is accepting implied subject identities and therefore implied properties and associations.
Being “implied” and not “explicit,” the interaction of subjects can change when someone, perhaps a hacker (or a fat-fingered user), supplies values that fall within the range of implied subject identities, properties, or associations.
Implied subject identities, properties, or associations, in code or data, reside in the minds of programmers, making detection well nigh impossible. At least prior to some hacker discovering an implied subject identity, property or association.
Avoiding implied subject identities, properties and associations will require work, loathsome to all programmers, but making subject identities explicit, enumerating their properties and allowed associations, in code and data, is a countable activity.
Having made subject identities explicit, capturing those results in code based on those explicit subject identities more robust. You won’t be piling implied subject identities on top of implied subject identities, or in plainer English, you won’t be writing cybersecurity software.
PS: Using a subject identity discipline does not mean you must document all of your code using XTM. You could but DSLs designed for your code/data may be more efficient.