I was puzzled by the following summary of investing in Big Data’s “critical needs:”
Recognizing the potential of data exploitation in today’s business environment, my firm, Trident Capital, continues to look for investments in three areas that address Big Data’s critical needs:
- People. Data scientists and business analysts have emerged as the critical personnel for the analysis and utilization of data. We are looking for services companies with large teams of such people and scalable analytic processes.
- Tools. Today’s Big Data analysis tools remain too low-level requiring analysts to perform many tasks manually. For this reason we are searching for new technologies to help with data ingestion, manipulation and exploitation.
- Applications. Businesses are still not able to capitalize on the results of the performed analyses in a timely manner often missing important opportunities. We are searching for companies developing analytic applications that enable business users to actionalize the Big Data sets their organizations collect. Example applications include customer experience data analysis enabling organizations to offer the right level of customer support, Internet of Things data analysis to optimize supply chains, and applications that use analysis results to assist professionals with complex tasks, such as a doctor during the diagnosis process.
The post, Presentation on Big Data Analytics and Watson at IBM’s Information on Demand Conference makes assumptions about “big data” that are subject to change.
For example, consider the first two points, the need for scalable service organizations and tools to avoid manual work with data. Both of are related and directly impacted by the quality of the data in question. It may be “big data” but if it is “clean” data, then the need for scaling service organizations and manual manipulation goes down.
But the presumption underlying this analysis is that we have “dirty big data” and there isn’t anything we can do about it. Really?
What if data, when produced, by human or automated means, were less “dirty,” if not in fact “clean?” At least for some purposes.
- Choose a “dirty” data set.
- What steps in its production would lessen the amount of dirt in the data?
- What would you suggest as ways to evaluate the cost vs. benefit of cleaner data?