Announcing SparkR: R on Spark by Shivaram Venkataraman.
From the post:
I am excited to announce that the upcoming Apache Spark 1.4 release will include SparkR, an R package that allows data scientists to analyze large datasets and interactively run jobs on them from the R shell.
R is a popular statistical programming language with a number of extensions that support data processing and machine learning tasks. However, interactive data analysis in R is usually limited as the runtime is single-threaded and can only process data sets that fit in a single machine’s memory. SparkR, an R package initially developed at the AMPLab, provides an R frontend to Apache Spark and using Spark’s distributed computation engine allows us to run large scale data analysis from the R shell.
…
The short news here or go to the Spark Summit to get the full story. (Code Databricks20 gets a 20% discount) (That’s next week, June 15 – 17, San Francisco. You need to act quickly.)
BTW, you can register for free live streaming!
Looking forward to this!