Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

December 10, 2013

Reverse Entity Recognition? (Scrubbing)

Filed under: Entities,Entity Resolution,Privacy — Patrick Durusau @ 12:51 pm

Improving privacy with language technologies by Rob Munro.

From the post:

One downside of the kind of technologies that we build at Idibon is that they can be used to compromise people’s privacy and, by extension, their safety. Any technology can be used for positive and negative purposes and as engineers we have a responsibility to ensure that what we create is for a better world.

For language technologies, the most negative application, by far, is eavesdropping: discovering information about people by monitoring their online communications and using that information in ways that harm the individuals. This can be something as direct and targeted as exposing the identities of at-risk individuals in a war-zone or it can be the broad expansion of government surveillance. The engineers at many technology companies announced their opposition to the latter with a loud, unified call today to reform government surveillance.

One way that privacy can be compromised at scale is the use of technology known as “named entity recognition”, which identifies the names of people, places, organizations, and other types of real-world entities in text. Given millions of sentences of text, named entity recognition can extract the names and addresses of everybody in the data in just a few seconds. But the same technology that can we used to uncover personally identifying information (PII) can also be used to remove the personally identifying information from the text. This is known as anonymizing or simply “scrubbing”.

Rob agrees that entity recognition can invade your personal privacy, but points out it can also protect your privacy.

You may think your “handle” on one or more networks provides privacy but it would not take much data to disappoint most people.

Entity recognition software can scrub data to remove “tells” that may identify you from it.

How much scrubbing is necessary depends on the data and the consequences of discovery.

Entity recognition is usually thought of as recognizing names, places, but it could just as easily be content analysis to recognize a particular author.

That would require more sophisticated “scrubbing” than entity recognition can support.

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress