Archive for the ‘Tuple MapReduce’ Category

Talend Open Studio for Big Data w/ Hadoop

Sunday, March 11th, 2012

Talend Empowers Apache Hadoop Community with Talend Open Studio for Big Data

From the post:

Talend, a global leader in open source integration software, today announced the availability of Talend Open Studio for Big Data, to be released under the Apache Software License. Talend Open Studio for Big Data is based on the world’s most popular open source integration product, Talend Open Studio, augmented with native support for Apache Hadoop. In addition, Talend Open Studio for Big Data will be bundled in Hortonworks’ leading Apache Hadoop distribution, Hortonworks Data Platform, constituting a key integration component of Hortonworks Data Platform, a massively scalable, 100 percent open source platform for storing, processing and analyzing large volumes of data.

Talend Open Studio for Big Data is a powerful and versatile open source solution for data integration that dramatically improves the efficiency of integration job design through an easy-to-use graphical development environment. Talend Open Studio for Big Data provides native support for Hadoop Distributed File System (HDFS), Pig, HBase, Sqoop and Hive. By leveraging Hadoop’s MapReduce architecture for highly-distributed data processing, Talend generates native Hadoop code and runs data transformations directly inside Hadoop for maximum scalability. This feature enables organizations to easily combine Hadoop-based processing, with traditional data integration processes, either ETL or ELT-based, for superior overall performance.

“By making Talend Open Studio for Big Data a key integration component of the Hortonworks Data Platform, we are providing Hadoop users with the ability to move data in and out of Hadoop without having to write complex code,” said Eric Baldeschwieler, CTO & co-founder of Hortonworks. “Talend provides the most powerful open source integration solution for enterprise data, and we are thrilled to be working with Talend to provide to the Apache Hadoop community such advanced integration capabilities.”


Talend Open Studio for Big Data will be available in May 2012. A preview version of the product is available immediately at

Good news but we also know that the Hadoop paradigm is evolving: Tuple MapReduce: beyond the classic MapReduce.

Will early adopters of Hadoop be just as willing to migrate as the paradigm develops?

Tuple MapReduce: beyond the classic MapReduce

Thursday, March 8th, 2012

Tuple MapReduce: beyond the classic MapReduce by Pere Ferrera Bertran.

From the post:

In this post we’ll review the MapReduce model proposed by Google in 2004 and propound another one called Tuple MapReduce. We’ll see that this new model is a generalization of the first and we’ll explain what advantages it has to offer. We’ll provide a practical example and conclude by discussing when the implementation of Tuple MapReduce is advisable.

In the conclusion:

In this post we have presented a new MapReduce model, Tuple MapReduce, and we have shown its benefits and virtues. We have generalized it in order to allow joins between different data sources (Tuple-Join MapReduce). We have noted that it allows the same things to be done as the MapReduce we already know, while making it much simpler to learn and use.

We believe that an implementation of Tuple MapReduce would be advisable and that it could act as a replacement for the original MapReduce. This implementation, instead of being comparable to existing high-level tools that have been created on top of MapReduce, would be comparable in efficiency to current implementations of MapReduce.

The post promises open source code in the near future.

I have to admit to being interested even without working code but that would quickly change to excitement upon successful testing of Tuple-Join MapReduce. Quite definitely the sort of mapping exercise that needs a standardized mapping language. 😉