Convert Existing Data into Parquet by Uri Laserson.
From the post:
Learn how to convert your data to the Parquet columnar format to get big performance gains.
Using a columnar storage format for your data offers significant performance advantages for a large subset of real-world queries. (Click here for a great introduction.)
Last year, Cloudera, in collaboration with Twitter and others, released a new Apache Hadoop-friendly, binary, columnar file format called Parquet. (Parquet was recently proposed for the ASF Incubator.) In this post, you will get an introduction to converting your existing data into Parquet format, both with and without Hadoop.
Actually, between Uri’s post and my pointing to it, Parquet has been accepted into the ASF Incubator!
All the more reason to start following this project.
Enjoy!