Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

November 25, 2013

Integrating R with Cloudera Impala…

Filed under: Cloudera,Impala,R — Patrick Durusau @ 8:00 pm

Integrating R with Cloudera Impala for Real-Time Queries on Hadoop by Istvan Szegedi.

From the post:

Cloudera Impala supports low-latency, interactive queries on Hadoop data sets either stored in Hadoop Distributed File System (HDFS) or HBase, the distributed NoSQL database for Hadoop. Impala’s notion is to use Hadoop as a storage engine but move away from MapReduce algorithms. Instead, Impala uses distributed queries, a concept inherited from massive parallel processing databases. As a result, Impala supports SQL-like query languange (in the same way way as Apache Hive), but can execute the queries 10-100 times fasters than Hive that converts them into MapReduce. You can find more details on Impala in one of the previous posts.

R is one of the most popular open source statistical computing and graphical software. It can work with various data sources from comma separated files to web contents referred by URLs to relational databases to NoSQL (e.g. MongoDB or Cassandra) and Hadoop.

Thanks to the generic Impala ODBC driver, R can be integrated with Impala, too. The solution will provide fast, interactive queries running on top of Hadoop data sets and then the data can be further processed or visualized within R.

Have you noticed that newer technologies (Hadoop) are becoming accessible to more traditional tools (R)?

Which will move traditional tool users towards newer technologies.

The combining of the new with the traditional has a distinct odor.

I think it is called success. 😉

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress