Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

November 24, 2014

Announcing Apache Hive 0.14

Filed under: Hadoop,Hive — Patrick Durusau @ 3:51 pm

Announcing Apache Hive 0.14 by Gunther Hagleitner.

From the post:

While YARN has allowed new engines to emerge for Hadoop, the most popular integration point with Hadoop continues to be SQL and Apache Hive is still the defacto standard. Although many SQL engines for Hadoop have emerged, their differentiation is being rendered obsolete as the open source community surrounds and advances this key engine at an accelerated rate.

Last week, the Apache Hive community released Apache Hive 0.14, which includes the results of the first phase in the Stinger.next initiative and takes Hive beyond its read-only roots and extends it with ACID transactions. Thirty developers collaborated on this version and resolved more than 1,015 JIRA issues.

Although there are many new features in Hive 0.14, there are a few highlights we’d like to highlight. For the complete list of features, improvements, and bug fixes, see the release notes.

If you have been watching the work on Spark + Hive: Apache Hive on Apache Spark: The First Demo, then you know how important Hive is to the Hadoop ecosystem.

The highlights:

Transactions with ACID semantics (HIVE-5317)

Allows users to modify data using insert, update and delete SQL statements. This provides snapshot isolation and uses locking for writes. Now users can make corrections to fact tables and changes to dimension tables.

Cost Base Optimizer (CBO) (HIVE-5775)

Now the query compiler uses a more sophisticated cost based optimizer that generates query plans based on statistics on data distribution. This works really well with complex joins and joins with multiple large fact tables. The CBO generates busy plans that execute much faster.

SQL Temporary Tables (HIVE-7090)

Temporary tables exist in scratch space that goes away when the user session disconnects. This allows users and BI tools to store temporary results and further process that data with multiple queries.

Coming Next in Stinger.next: Sub-Second Queries

After Hive 0.14, we’re planning on working with the community to deliver sub-second queries and SQL:2011 Analytics coverage in Hive. We also plan to work on Hive-Spark integration for machine learning and operational reporting with Hive streaming ingest and transactions.

Hive is an example of how an open source project should be supported.

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress