Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

October 17, 2012

Improving the integration between R and Hadoop: rmr 2.0 released

Filed under: Hadoop,R,RHadoop — Patrick Durusau @ 9:14 am

Improving the integration between R and Hadoop: rmr 2.0 released

David Smith reports:

The RHadoop project, the open-source project supported by Revolution Analytics to integrate R and Hadoop, continues to evolve. Now available is version 2 of the rmr package, which makes it possible for R programmers to write map-reduce tasks in the R language, and have them run within the Hadoop cluster. This update is the "simplest and fastest rmr yet", according to lead developer Antonio Piccolboni. While previous releases added performance-improving vectorization capabilities to the interface, this release simplifies the API while still improving performance (for example, by using native serialization where appropriate). This release also adds some conveniance functions, for example for taking random samples from Big Data stored in Hadoop. You can find further details of the changes here, and download RHadoop here

RHadoop Project: Changelog

As you know, I’m not one to complain, ;-), but I read from above:

…this release simplifies the API while still improving performance [a good thing]

as contradicting the release notes that read in part:

…At the same time, we increased the complexity of the API. With this version we tried to define a synthesis between all the modes (record-at-a-time, vectorized and structured) present in 1.3, with the following goals:

  • bring the footprint of the API back to 1.2 levels.
  • make sure that no matter what the corner of the API one is exercising, he or she can rely on simple properties and invariants; writing an identity mapreduce should be trivial.
  • encourage writing the most efficient and idiomatic R code from the start, as opposed to writing against a simple API first and then developing a vectorized version for speed.

After reading the change notes, I’m going with the “simplifies the API” riff.

Take a close look and see what you think.

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress