Ten Trends in Data Science 2015

Ten Trends in Data Science 2015 by Kurt Cagle.

From the post:

There is a certain irony talking about trends in data science, as much of data science is geared primarily to detecting and extrapolating trends from disparate data patterns. In this case, this is part of a series of analyses I’ve written for over a decade, looking at what I see as the key areas that most heavily impact the area of technology I’m focusing on at the top. For the last few years, this has been a set of technologies which have increasingly been subsumed under the rubrick of Data Science.

I tend to use the term to embrace an understanding of four key areas – Data Acquisition (how you get data into a usable form and set of stores or services), Data Awareness (how you provide context to this data so that it can work more effectively across or between enterprises), Data Analysis (turning this aware data into usable information for decision makers and data consumers) and Data Governance (establishing the business structures, provenance maintenance and continuity for that data). These I collectively call the Data Cycle, and it seems to be the broad arc that most data (whether Big Data or Small Data) follows in its life cycle. I’ll cover this cycle in more detail later, but for now, it provides a reasonably good scope for what I see as the trends that are emerging in this field.

This has been a remarkably good year in the field of data science – the Big Data field both matured and spawned a few additional areas of study, semantics went from being an obscure term to getting attention in the C-Suite and the demand for good data visualizers went from tepid to white hot.

A great overview of what is likely to be “hot” in 2015.

I disagree with Kurt when he says:


Over the course of the next year, this major upgrade to the SPARQL standard will become the de facto mechanism for communicating with triple stores, which will in turn driive the utilization of new semantics-based applications.

Semantics already figure pretty heavily in recommendation engines and similar applications, since these kinds of applications deal more heavily with searching and making connections between types of resources, and it plays fairly heavily in areas such as machine learning and NLP.

Not that I disagree with semantics being the area where large strides could be made and large profits as well. I disagree that SPARQL and triple-stores are going to play a meaningful role with regard to semantics, especially with recommendation engines, machine learning and NLP.

The “semantics” that recommendation engines mine are entirely unknown to the recommendation engine. Such a engine is ingesting a large amount of data and without making an explicit semantic choice, recommends a product to a user based on previous choices by that user and others. It is an entirely mechanical operation that has no sense of “semantics” at all. Semantic “understanding” isn’t required for Netflix or Amazon to do a pretty good job of recommending items to customers.

In terms of a recommendation, I seriously doubt a recommendation engine relies upon two items having a part-whole or class-subclass relationship. It is relying upon observed shopping/consumption behavior which may or may not have any internal coherence at all. What matters to a vendor, is that a sale is made, semantics be damned.

Other than that quibble, Kurt is predicting what most people anticipate seeing next year. Now for the fun part, seeing how the future develops in interesting and unpredictable ways.

Comments are closed.