Reidentification as Basic Science

Reidentification as Basic Science by Arvind Narayanan.

From the post:

What really drives reidentification researchers? Do we publish these demonstrations to alert individuals to privacy risks? To shame companies? For personal glory? If our goal is to improve privacy, are we doing it in the best way possible?

In this post I’d like to discuss my own motivations as a reidentification researcher, without speaking for anyone else. Certainly I care about improving privacy outcomes, in the sense of making sure that companies, governments and others don’t get away with mathematically unsound promises about the privacy of consumers’ data. But there is a quite different goal I care about at least as much: reidentification algorithms. These algorithms are my primary object of study, and so I see reidentification research partly as basic science.

Let me elaborate on why reidentification algorithms are interesting and important. First, they yield fundamental insights about people — our interests, preferences, behavior, and connections — as reflected in the datasets collected about us. Second, as is the case with most basic science, these algorithms turn out to have a variety of applications other than reidentification, both for good and bad. Let us consider some of these.


A nice introduction to the major contours of reidentification, which the IT Law Wiki defines as:

Data re-identification is the process by which personal data is matched with its true owner.

Although in topic map speak I would usually say that personal data was used to identify its owner.

In a reidentification context, some effort has been made to obscure that relationship, so matching may be the better usage.

Depending on your data sources, something you may encounter when building a topic map.

I first saw this at Pete Warden’s Five short links.

Comments are closed.