Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

January 3, 2014

Data Without Meaning? [Dark Data]

Filed under: Data,Data Analysis,Data Mining,Data Quality,Data Silos — Patrick Durusau @ 5:47 pm

I was reading IDC: Tons of Customer Data Going to Waste by Beth Schultz when I saw:

As much as companies understand the need for data and analytics and are evolving their relationships with both, they’re really not moving quickly enough, Schaub suggested during an IDC webinar earlier this week about the firm’s top 10 predictions for CMOs in 2014. “The aspiration is know that customer, and know what the customer wants at every single touch point. This is going to be impossible in today’s siloed, channel orientation.”

Companies must use analytics to help take today’s multichannel reality and recreate “the intimacy of the corner store,” she added.

Yes, great idea. But as IDC pointed out in the prediction I found most disturbing — especially with how much we hear about customer analytics — gobs of data go unused. In 2014, IDC predicted, “80% of customer data will be wasted due to immature enterprise data ‘value chains.’ ” That has to set CMOs to shivering, and certainly IDC found it surprising, according to Schaub.

That’s not all that surprising, either the 80% and/or the cause being “immature enterprise data ‘value chains.'”

What did surprise me was:

IDC’s data group researchers say that some 80% of data collected has no meaning whatsoever, Schaub said.

I’m willing to bet the wasted 80% of consumer data and the “no meaning” 80% of consumer data, is the same 80%.

Think about it.

If your information chain isn’t associating meaning with the data you collect, the data may as well be streaming to /dev/null.

The data isn’t without meaning, you just failed to capture it. Not the same thing as having “no meaning.”

Failing to capture meaning along with data is one way to produce what I call “dark data.”

I first saw this in a tweet by Gregory Piatetsky.

5 Comments

  1. Maybe it’s just a semantic distinction but I don’t think a system or the creator of a system can capture, provide, or create meaning. Since meaning is assigned by the reader or user of the data, all the generator of the data can do is to capture and provide context for the user to help them to create their own meaning.

    Comment by clemp — January 3, 2014 @ 10:31 pm

  2. Interesting. I don’t think that was the distinction Beth was making. Her distinction was based upon an implied notion of precision. I think an author/reader can make coarser or finer distinctions in terms of precision.

    Whether *another* reader will understand those distinctions the same way, which I think lies at the heart of your question, we can never know.

    A reader (the author after writing or another reader) is always imbuing the text with meaning so in that sense, I agree with you.

    But, an information system can represent the tokens *a* reader associates with those meanings and, if provided, mappings to other tokens with the same meaning, to that reader.

    With your nudging I think we can say that a topic map can succeed because the rules for merging *have no meaning* for the machine, only the user who authored them.

    That is a topic map engine does not stop to evaluate what “meaning” it associates with some token because it doesn’t. It is mechanically applying a set of rules specified by a reader to have a meaningful result to that reader.

    I had never thought about it quite that way. Thanks!

    Comment by Patrick Durusau — January 4, 2014 @ 10:18 am

  3. Along the lines of the discussion you and Aki were having on LinkedIn, if there was some mechanism that allowed the *reader* to define or select which merging rules to apply, that would support the reader in understanding (or assigning meaning to) the topics by providing greater context to each topic being reviewed. My understanding of the current Topic Map software is that the merging “rules” are defined by the *author* of the Topic Map which, IMO, artificially narrows the potential uses for that map because it puts the burden on the author to try to think through all the ways the data might be used when defining the rules. The current alternative to that (and the one I think most Topic Map authors choose) is to leave some topics or associations out of the map to avoid the problem of everything merging with everything. However, that solution means that Topic Maps provide less context to the data than they otherwise might and that seems to run counter to the purpose of topic maps.

    Comment by clemp — January 4, 2014 @ 6:18 pm

  4. True, at leas to a degree. A reader of a topic map could always add topics or properties to topics to cause merging to occur so in that sense they would be reader/authors.

    But to be fair, I think the implied emphasis has always been on authors creating content as deliverables and not topic maps as interactive resources that are molded by a reader.

    Could have been true had the TMQL work been completed but there was resistance to it. Personally I thought the work offered a grounding that would support a TMQL that would support any number of merging models. You would have to ask someone who disagreed with the work what they disliked about it. (There is Perl implementation of most of it if you are interested. http://search.cpan.org/dist/TM/)

    Comment by Patrick Durusau — January 7, 2014 @ 11:11 am

  5. “I think the implied emphasis has always been on authors creating content as deliverables and not topic maps as interactive resources that are molded by a reader.”

    That has been the impression I have been getting since I started learning topic maps but I thought I might not have understood some part of the stack. I guess the part I was missing was the TMQL.

    That is a shame that it was never finished. Without it, Topic Maps seem like mostly a static (but flexible) publishing medium. Topic Maps always struck me as having the right balance between structure/flexibility and simplicity/complexity. But, in an interactive world, do we need another static publishing medium?

    HTML already provides a way for me to read unstructured information structured the way the author wants to present it. SQL databases already provide a way to view structured data as long as I’m willing to use the model someone else decided on. RDF already provides a way to work with semi-structured data…using ontologies defined by committee that may or may not have had my use-case in mind. What is still missing is way to filter complex, highly linked information to focus on some aspect or connections that the original authors may not even have conceived of.

    Comment by clemp — January 7, 2014 @ 11:26 pm

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress