Archive for the ‘Kettle’ Category

Pentaho open sources ‘big data’ integration tools under Apache 2.0

Wednesday, February 1st, 2012

Pentaho open sources ‘big data’ integration tools under Apache 2.0

Chris Kanaracus writes:

Business intelligence vendor Pentaho is releasing as open source a number of tools related to “big data” in the 4.3 release of its Kettle data-integration platform and has moved the project overall to the Apache 2.0 license, the company announced Monday.

While Kettle had always been available in a community edition at no charge, the tools being open sourced were previously only available in the company’s commercialized edition. They include integrations for Hadoop’s file system and MapReduce as well as connectors to NoSQL databases such as Cassandra and MongoDB.

Those technologies are some of the most popular tools associated with the analysis of “big data,” an industry buzzword referring to the ever-larger amounts of unstructured information being generated by websites, sensors and other sources, along with transactional data from enterprise applications.

The big data components will still be offered as part of a commercial package, Pentaho Business Analytics Enterprise Edition, which bundles in tech support maintenance and additional functionality, said Doug Moran, company co-founder and big data product manager.

Who would have thought as recently as two years ago that big data analysis would face an embarrassment of open source riches?

Even though “open source,” production use of any of the “open source” tools in a big data environment requires a substantial investment of human and technical resources.

I see the usual promotional webinars but for unstructured data, I wonder why we don’t see the usual suspects in competitions like TREC?

Ranking in such an event should not be the only consideration but at least would be a public test of the various software offerings.