Archive for the ‘HiveQL’ Category

Writing Hive UDFs – a tutorial

Monday, February 18th, 2013

Writing Hive UDFs – a tutorial by Alexander Dean.


In this article you will learn how to write a user-defined function (“UDF”) to work with the Apache Hive platform. We will start gently with an introduction to Hive, then move on to developing the UDF and writing tests for it. We will write our UDF in Java, but use Scala’s SBT as our build tool and write our tests in Scala with Specs2.

In order to get the most out of this article, you should be comfortable programming in Java. You do not need to have any experience with Apache Hive, HiveQL (the Hive query language) or indeed Hive UDFs – I will introduce all of these concepts from first principles. Experience with Scala is advantageous, but not necessary.

The example UDF isn’t impressive so those are left as an exercise for the reader. 😉

Also of interest:

Hive User Defined Functions (at the Apache Hive wiki).

Which you should compare to:

What are the biggest feature gaps between HiveQL and SQL? (at Quora)

There are plenty of opportunities for new UDFs, including those addressing semantic integration.

I first saw this in NoSQL Weekly, Issue 116.