Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

August 21, 2013

Simple Hive ‘Cheat Sheet’ for SQL Users

Filed under: Hive,SQL — Patrick Durusau @ 4:51 pm

Simple Hive ‘Cheat Sheet’ for SQL Users by Marc Holmes.

From the post:

If you’re already familiar with SQL then you may well be thinking about how to add Hadoop skills to your toolbelt as an option for data processing.

From a querying perspective, using Apache Hive provides a familiar interface to data held in a Hadoop cluster and is a great way to get started. Apache Hive is data warehouse infrastructure built on top of Apache Hadoop for providing data summarization, ad-hoc query, and analysis of large datasets. It provides a mechanism to project structure onto the data in Hadoop and to query that data using a SQL-like language called HiveQL (HQL).

Naturally, there are a bunch of differences between SQL and HiveQL, but on the other hand there are a lot of similarities too, and recent releases of Hive bring that SQL-92 compatibility closer still.

To highlight that – and as a bit of fun to get started – below is a simple ‘cheat sheet’ (based on a simple MySQL reference such as this one) for getting started with basic querying for Hive. Here, we’ve done a direct comparison to MySQL, but given the simplicity of these particular functions, then it should be the same in essentially any SQL dialect.

Of course, if you really want to get to grips with Hive, then take a look at the full language manual.
(…)

Definitely going to print this cheat sheet out and put it in plastic.

A top of the desk sort of reference.

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress