Hadoop, the Big Data processing and analytics framework, isn’t your average open source project.

“If you look at a lot of the open source software that’s been popular out of Apache and elsewhere, its sort of like an open source replacement for something you can already get elsewhere,” said Todd Lipcon, a senior software engineer at Cloudera. “I think Hadoop is kind of unique in that it’s the only option for doing this kind of analysis.”

Lipcon is right. Open Office is an open source office suite alternative to Microsoft Office. MySQL is an open source database alternative to Oracle. Hadoop is an open source Big Data framework alternative for …. Well, there is no alternative.

Now that Daytona has been released by MS along with Excel DataScope, it would be interesting to know how Todd Lipcon sees the ease of use issue?

Powerful technology (LaTeX anyone?) may far exceed the capabilities of (insert your favorite word processor) but if the difficulty of use factor is too high, poorer alternatives will occupy most of the field.

That may give people with the more powerful technology a righteous feeling, but I am not interested in feeling righteous.

I am interested in winning, which means having a powerful technology that can be used by a wide variety of users of varying skill levels.

Some will use it poorer or barely invoking its capabilities. Others will make good but unimaginative use of it. Still others will push the envelope in terms of what it can do. All are legitimate and all are valuable in their own way.

From the webpage:

From the familiar interface of Microsoft Excel, Excel DataScope enables researchers to accelerate data-driven decision making. It offers data analytics, machine learning, and information visualization by using Windows Azure for data and compute-intensive tasks. Its powerful analysis techniques are applicable to any type of data, ranging from web analytics to survey, environmental, or social data.


Excel DataScope is a technology ramp between Excel on the user’s client machine, the resources that are available in the cloud, and a new class of analytics algorithms that are being implemented in the cloud. An Excel user can simply select an analytics algorithm from the Excel DataScope Research Ribbon without concern for how to move their data to the cloud, how to start up virtual machines in the cloud, or how to scale out the execution of their selected algorithm in the cloud. They simply focus on exploring their data by using a familiar client application.

Excel DataScope is an ongoing research and development project. We envision a future in which a model developer can publish their latest data analysis algorithm or machine learning model to the cloud and within minutes Excel users around the world can discover it within their Excel Research Ribbon and begin using it to explore their data collection. (emphasis added)

I added emphasis to the last sentence because that is the sort of convenience/collaboration that will make cloud computing and collaboration meaningful.

Imagine that sort of sharing across MS and non-MS cloud resources. Well, you would have to have an Excel DataScope interface on non-MS cloud resources, but one hopes that will be a product offering in the near future.