Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

February 23, 2014

How Companies are Using Spark

Filed under: BigData,Hadoop,Spark — Patrick Durusau @ 7:50 pm

How Companies are Using Spark, and Where the Edge in Big Data Will Be by Matei Zaharia.

Description:

While the first big data systems made a new class of applications possible, organizations must now compete on the speed and sophistication with which they can draw value from data. Future data processing platforms will need to not just scale cost-effectively; but to allow ever more real-time analysis, and to support both simple queries and today’s most sophisticated analytics algorithms. Through the Spark project at Apache and Berkeley, we’ve brought six years research to enable real-time and complex analytics within the Hadoop stack.

At time mark 1:53, Matei says when size of storage is no longer an advantage, you can gain an advantage by:

Speed: how quickly can you go from data to decisions?

Sophistication: can you run the best algorithms on the data?

As you might suspect, I strongly disagree that those are the only two points where you can gain an advantage with Big Data.

How about including:

Data Quality: How do you make data semantics explicit?

Data Management: Can you re-use data by knowing its semantics?

You can run sophisticated algorithms on data and make quick decisions, but if your data is GIGO (garbage in, garbage out), I don’t see the competitive edge.

Nothing against Spark, managing video streams with only 1 second of buffering was quite impressive.

To be fair, Matei does include ClearStoryData as one of his examples and ClearStory says that they merge data based in its semantics. Unfortunately, the website doesn’t mention any details other than there is a “patent pending.”

But in any event, I do think data quality and data management should be explicit items in any big data strategy.

At least so long as you want big data and not big garbage.

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress