Apache HCatalog 0.4.0 Released by Alan Gates.
From the post:
In case you didn’t see the news, I wanted to share the announcement that HCatalog 0.4.0 is now available.
For those of you that are new to the project, HCatalog provides a metadata and table management system that simplifies data sharing between Apache Hadoop and other enterprise data systems. You can learn more about the project on the Apache project site.
From the HCatalog documentation (0.4.0):
HCatalog is a table and storage management layer for Hadoop that enables users with different data processing tools – Pig, MapReduce, and Hive – to more easily read and write data on the grid. HCatalog’s table abstraction presents users with a relational view of data in the Hadoop distributed file system (HDFS) and ensures that users need not worry about where or in what format their data is stored – RCFile format, text files, or sequence files.
HCatalog supports reading and writing files in any format for which a SerDe can be written. By default, HCatalog supports RCFile, CSV, JSON, and sequence file formats. To use a custom format, you must provide the InputFormat, OutputFormat, and SerDe.
Being curious about a reference to partitions having the capacity to be multidimensional, I set off looking for information on supported data types and found:
The table shows how Pig will interpret the HCatalog data type.
HCatalog Data Type
Pig Data Type
primitives (int, long, float, double, string)
int, long, float, double, string to chararray
map (key type should be string, valuetype must be string)
map
List<any type>
bag
struct<any type fields>
tuple
The Hadoop ecosystem is evolving at a fast and furious pace!