Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

January 29, 2012

HadoopDB: Efficient Processing of Data Warehousing Queries in a Split Execution Environment

Filed under: Hadapt,Hadoop — Patrick Durusau @ 9:14 pm

HadoopDB: Efficient Processing of Data Warehousing Queries in a Split Execution Environment

From the post:

The buzz about Hadapt and HadoopDB has been around for a while now as it is one of the first systems to combine ideas from two different approaches, namely parallel databases based on a shared-nothing architecture and map-reduce, to address the problem of large scale data storage and analysis.

This early paper that introduced HadooDB crisply summarizes some reasons why parallel database solutions haven’t scaled to hundreds machines. The reasons include –

  1. As the number of nodes in a system increases failures become more common.
  2. Parallel databases usually assume a homogeneous array of machines which becomes impractical as the number of machines rise.
  3. They have not been tested at larger scales as applications haven’t demanded more than 10′s of nodes for performance until recently.

Interesting material to follow on the HPCC vs. Hadoop post.

Not to take sides, just the beginning of the type of analysis that will be required.

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress