Archive for the ‘RHIPE’ Category

Tessera

Tuesday, November 4th, 2014

Tessera

From the webpage:

The Tessera computational environment is powered by a statistical approach, Divide and Recombine. At the front end, the analyst programs in R. At the back end is a distributed parallel computational environment such as Hadoop. In between are three Tessera packages: datadr, Trelliscope, and RHIPE. These packages enable the data scientist to communicate with the back end with simple R commands.

Divide and Recombine (D&R)

Tessera is powered by Divide and Recombine. In D&R, we seek meaningful ways to divide the data into subsets, apply statistical methods to each subset independently, and recombine the results of those computations in a statistically valid way. This enables us to use the existing vast library of methods available in R – no need to write scalable versions

DATADR

The datadr R package provides a simple interface to D&R operations. The interface is back end agnostic, so that as new distributed computing technology comes along, datadr will be able to harness it. Datadr currently supports in-memory, local disk / multicore, and Hadoop back ends, with experimental support for Apache Spark. Regardless of the back end, coding is done entirely in R and data is represented as R objects.

TRELLISCOPE

Trelliscope is a D&R visualization tool based on Trellis Display that enables scalable, flexible, detailed visualization of data. Trellis Display has repeatedly proven itself as an effective approach to visualizing complex data. Trelliscope, backed by datadr, scales Trellis Display, allowing the analyst to break potentially very large data sets into many subsets, apply a visualization method to each subset, and then interactively sample, sort, and filter the panels of the display on various quantities of interest.
trelliscope

RHIPE

RHIPE is the R and Hadoop Integrated Programming Environment. RHIPE allows an analyst to run Hadoop MapReduce jobs wholly from within R. RHIPE is used by datadr when the back end for datadr is Hadoop. You can also perform D&R operations directly through RHIPE , although in this case you are programming at a lower level.

Quite an impressive package for R and “big data.”

I first saw this in a tweet by Christophe Lalanne.

RHIPE: An Interface Between Hadoop and R for Large and Complex Data Analysis

Friday, February 25th, 2011

RHIPE: An Interface Between Hadoop and R for Large and Complex Data Analysis

Enables processing with R across data sets too large to load.

But, you have to see the video to watch the retrieval from 14 GB of data that had been produced using RHIPE. Or the 145 GB of SSH traffic from the Department of Homeland Security.

Very impressive.