Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

March 27, 2013

Apache Tajo

Filed under: Apache Tajo,HDFS,SQL — Patrick Durusau @ 12:31 pm

Apache Tajo

From the webpage:

Introduction

Tajo is a relational and distributed data warehouse system for Hadoop. Tajo is designed for low-latency and scalable ad-hoc queries, online aggregation and ETL on large-data sets by leveraging advanced database techniques. It supports SQL standards. Tajo uses HDFS as a primary storage layer and has its own query engine which allows direct control of distributed execution and data flow. As a result, Tajo has a variety of query evaluation strategies and more optimization opportunities. In addition, Tajo will have a native columnar execution and and its optimizer.

Features

  • Fast and low-latency query processing on SQL queries including projection, filter, group-by, sort, and join.
  • Rudiment ETL that transforms one data format to another data format.
  • Support various file formats, such as CSV, RCFile, RowFile (a row store file), and Trevni.
  • Command line interface to allow users to submit SQL queries
  • Java API to enable clients to submit SQL queries to Tajo

If you ever wanted to get in on the ground floor of a data warehouse project, this could be your chance!

I first saw this at ‎Apache Incubator: Tajo – a Relational and Distributed Data Warehouse for Hadoop by Alex Popescu.

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress