Hive-on-Spark « Another Word For It

Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

February 25, 2015

Download the Hive-on-Spark Beta

Filed under: Hive,Hive-on-Spark,Spark — Patrick Durusau @ 4:00 pm

Download the Hive-on-Spark Beta by Xuefu Zhang.

From the post:

The Hive-on-Spark project (HIVE-7292) is one of the most watched projects in Apache Hive history. It has attracted developers from across the ecosystem, including from organizations such as Intel, MapR, IBM, and Cloudera, and gained critical help from the Spark community.

Many anxious users have inquired about its availability in the last few months. Some users even built Hive-on-Spark from the branch code and tried it in their testing environments, and then provided us valuable feedback. The team is thrilled to see this level of excitement and early adoption, and has been working around the clock to deliver the product at an accelerated pace.

Thanks to this hard work, significant progress has been made in the last six months. (The project is currently incubating in Cloudera Labs.) All major functionality is now in place, including different flavors of joins and integration with Spark, HiveServer2, and YARN, and the team has made initial but important investments in performance optimization, including split generation and grouping, supporting vectorization and cost-based optimization, and more. We are currently focused on running benchmarks, identifying and prototyping optimization areas such as dynamic partition pruning and table caching, and creating a roadmap for further performance enhancements for the near future.

Two month ago, we announced the availability of an Amazon Machine Image (AMI) for a hands-on experience. Today, we even more proudly present you a Hive-on-Spark beta via CDH parcel. You can download that parcel here. (Please note that in this beta release only HDFS, YARN, Apache ZooKeeper, and Hive are supported. Other components, such as Apache Pig, Apache Oozie, and Impala, might not work as expected.) The “Getting Started” guide will help you get your Hive queries up and running on the Spark engine without much trouble.

We welcome your feedback. For assistance, please use user@hive.apache.org or the Cloudera Labs discussion board.

We will update you again when GA is available. Stay tuned!

If you are snowbound this week, this may be what you have been looking for!

I have listed this under both Hive and Spark separately but am confident enough of its success that I created Hive-on-Spark as well.

Enjoy!

Comments Off