SPARQL and Big Data (and NoSQL) by Bob DuCharme.
From the post:
How to pursue the common ground?
I think it’s obvious that SPARQL and other RDF-related technologies have plenty to offer to the overlapping worlds of Big Data and NoSQL, but this doesn’t seem as obvious to people who focus on those areas. For example, the program for this week’s Strata conference makes no mention of RDF or SPARQL. The more I look into it, the more I see that this flexible, standardized data model and query language align very well with what many of those people are trying to do.
But, we semantic web types can’t blame them for not noticing. If you build a better mouse trap, the world won’t necessarily beat a path to your door, because they have to find out about your mouse trap and what it does better. This requires marketing, which requires talking to those people in language that they understand, so I’ve been reading up on Big Data and NoSQL in order to better appreciate what they’re trying to do and how.
A great place to start is the excellent (free!) booklet Planning for Big Data by Edd Dumbill. (Others contributed a few chapters.) For a start, he describes data that “doesn’t fit the strictures of your database architectures” as a good candidate for Big Data approaches. That’s a good start for us. Here are a few longer quotes that I found interesting, starting with these two paragraphs from the section titled “Ingesting and Cleaning” after a discussion about collecting data from multiple different sources (something else that RDF and SPARQL are good at):
Bob has a very good point: marketing “…requires talking to those people in language that they understand….”
That is, no matter how “good” we think a solution may be, it won’t interest others until we explain it in terms they “get.”
But “marketing” requires more than a lingua franca.
Once an offer is made and understood, it must interest the other person. Or it is very poor marketing.
We may think that any sane person would jump at the chance to reduce the time and expense of data cleaning. But that isn’t necessarily the case.
I once made a proposal that would substantially reduce the time and expense for maintaining membership records. Records that spanned decades and were growing every year (hard copy). I made the proposal, thinking it would be well received.
Hardly. I was called into my manager’s office and got a lecture on how the department in question had more staff, a larger budget, etc., than any other department. They had no interest whatsoever in my proposal and that I should not presume to offer further advice. (Years later my suggestion was adopted when budget issues forced the issue.)
Efficient information flow interested me but not management.
Bob and the rest of us need to ask the traditional question: Cui bono? (To whose benefit?)
Semantic technologies, just like any other, have winners and losers.
To effectively market our wares, we need to identify both.