Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

May 11, 2012

Crowdsourcing – A Solution to your “Bad Data” Problems

Filed under: Crowd Sourcing,Data Quality — Patrick Durusau @ 3:11 pm

Crowdsourcing – A Solution to your “Bad Data” Problems by Hollis Tibbetts.

Hollis writes:

Data problems – whether they be inaccurate data, incomplete data, data categorization issues, duplicate data, data in need of enrichment – are age-old.

IT executives consistently agree that data quality/data consistency is one of the biggest roadblocks to them getting full value from their data. Especially in today’s information-driven businesses, this issue is more critical than ever.

Technology, however, has not done much to help us solve the problem – in fact, technology has resulted in the increasingly fast creation of mountains of “bad data”, while doing very little to help organizations deal with the problem.

One “technology” holds much promise in helping organizations mitigate this issue – crowdsourcing. I put the word technology in quotation marks – as it’s really people that solve the problem, but it’s an underlying technology layer that makes it accurate, scalable, distributed, connectable, elastic and fast. In an article earlier this week, I referred to it as “Crowd Computing”.

Crowd Computing – for Data Problems

The Human “Crowd Computing” model is an ideal approach for newly entered data that needs to either be validated or enriched in near-realtime, or for existing data that needs to be cleansed, validated, de-duplicated and enriched. Typical data issues where this model is applicable include:

  • Verification of correctness
  • Data conflict and resolution between different data sources
  • Judgment calls (such as determining relevance, format or general “moderation”)
  • “Fuzzy” referential integrity judgment
  • Data error corrections
  • Data enrichment or enhancement
  • Classification of data based on attributes into categories
  • De-duplication of data items
  • Sentiment analysis
  • Data merging
  • Image data – correctness, appropriateness, appeal, quality
  • Transcription (e.g. hand-written comments, scanned content)
  • Translation

In areas such as the Data Warehouse, Master Data Management or Customer Data Management, Marketing databases, catalogs, sales force automation data, inventory data – this approach is ideal – or any time that business data needs to be enriched as part of a business process.

Hollis has a number of good points. But the choice doesn’t have to be “big data/iron” versus “crowd computing.”

More likely to get useful results out of some combination of the two.

Make “big data/iron” responsible for raw access, processing, visualization in an interactive environment with semantics supplied by the “crowd computers.”

And vet participants on both sides in real time. Would be a novel thing to have firms competing to supply the interactive environment and being paid on the basis of the “crowd computers” that preferred it or got better results.

That is a ways past where Hollis is going but I think it leads naturally in that direction.

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress