Semantic Tech the Key to Finding Meaning in the Media by Chris Lamb.
Chris starts off well enough:
News volume has moved from infoscarcity to infobesity. For the last hundred years, news in print was delivered in a container, called a newspaper, periodically, typically every twenty-four hours. The container constrained the product. The biggest constraints of the old paradigm were periodic delivery and limitations of column inches.
Now information continually bursts through our Google Readers, our cell phones, our tablets, display screens in elevators and grocery stores. Do we really need to read all 88,731 articles on the Bernie Madoff trial? Probably not. And that’s the dilemma for news organizations.
In the old metaphor, column-inches was the constraint. In the new metaphor, reader attention span becomes the constraint.
But, then quickly starts to fade:
Disambiguation is a technique to uniquely identify named entities: people, cities, and subjects. Disambiguation can identify that one article is about George Herbert Walker Bush, the 41st President of the US, and another article is about George Walker Bush, number 43. Similarly, the technology can distinguish between Lincoln Continental, the car, and Lincoln, Nebraska, the town. As part of the metadata, many tagging engines that disambiguate return unique identifiers called Uniform Resource Identifiers (URI). A URI is a pointer into a database.
If tagging creates machine readable assets, disambiguation is the connective tissue between these assets. Leveraging tagging and disambiguation technologies, applications can now connect content with very disparate origins. Today’s article on George W. Bush can be automatically linked to an article he wrote when he owned the Texas Ranger’s baseball team. Similarly the online bio of Bill Gates can be automatically tied to his online New Mexico arrest record in April 1975.
Apparently he didn’t read the paper The communicative function of ambiguity in language.
The problem with disambiguation is that you and I may well set up a system to disambiguate named entities differently. To be sure, we will get some of them the same, but the question becomes which ones? Is 80% of them the same enough?
Depends on the application doesn’t it? What if we are looking for a terrorist who may have fissionable material? Does 80% look good enough?
Ironic. Disambiguation is subject to the same ambiguity as it set out to solve.
PS: URIs aren’t necessarily pointers into databases.