Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

April 30, 2011

When Data Mining Goes Horribly Wrong

Filed under: Data Mining,Merging,Search Engines — Patrick Durusau @ 10:22 am

In When Data Mining Goes Horribly Wrong, Matthew Hurst brings us a cautionary tale about what can happen when “merging” decisions are made badly.

From the blog:

Consequently, when you see a details page – either on Google, Bing or some other search engine with a local search product – you are seeing information synthesized from multiple sources. Of course, these sources may differ in terms of their quality and, as a result, the values they provide for certain attributes.

When combining data from different sources, decisions have to be made as to firstly when to match (that is to say, assert that the data is about the same real world entity) and secondly how to merge (for example: should you take the phone number found in one source or another?).

This process – the conflation of data – is where you either succeed or fail.

Read Matthew’s post for encouraging signs that there is plenty of room for the use of topic maps.

What I find particularly amusing is that repair of the merging in this case doesn’t help prevent it from happening again and again.

Not much of a repair if the problem continues to happen elsewhere.

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress