Archive for the ‘LangSec’ Category

Dark Matter: Driven by Data

Monday, May 9th, 2016

A delightful keynote by Dan Geer, presented at the 2015 LangSec Workshop at the IEEE Symposium on Security & Privacy Workshops, May 21, 2015, San Jose, CA.

Prepared text for the presentation.

A quote to interest you in watching the video:

Workshop organizer Meredith Patterson gave me a quotation from Taylor Hornby that I hadn’t seen. In it, Hornby succinctly states the kind of confusion we are in and which LANGSEC is all about:

The illusion that your program is manipulating its data is powerful. But it is an illusion: The data is controlling your program.

It almost appears that we are building weird machines on purpose, almost the weirder the better. Take big data and deep learning. Where data science spreads, a massive increase in tailorability to conditions follows. But even if Moore’s Law remains forever valid, there will never be enough computing hence data driven algorithms must favor efficiency above all else, yet the more efficient the algorithm, the less interrogatable it is,[MO] that is to say that the more optimized the algorithm is, the harder it is to know what the algorithm is really doing.[SFI]

And there is a feedback loop here: The more desirable some particular automation is judged to be, the more data it is given. The more data it is given, the more its data utilization efficiency matters. The more its data utilization efficiency matters, the more its algorithms will evolve to opaque operation. Above some threshold of dependence on such an algorithm in practice, there can be no going back. As such, if science wishes to be useful, preserving algorithm interrogatability despite efficiency-seeking, self-driven evolution is the research grade problem now on the table. If science does not pick this up, then Lessig’s characterization of code as law[LL] is fulfilled. But if code is law, what is a weird machine?

If you can’t interrogate an algorithm, could you interrogate a topic map that is an “inefficient” implementation of the algorithm?

Or put differently, could there be two representations of the same algorithm, one that is “efficient,” and one that can be “interrogated?”

Read the paper version but be aware the video has a very rich Q&A session that follows the presentation.