Archive for the ‘TPC-H’ Category

TPC Benchmark H

Wednesday, February 13th, 2013

TPC Benchmark H

From the webpage:

Summary

The TPC Benchmark™H (TPC-H) is a decision support benchmark. It consists of a suite of business oriented ad-hoc queries and concurrent data modifications. The queries and the data populating the database have been chosen to have broad industry-wide relevance. This benchmark illustrates decision support systems that examine large volumes of data, execute queries with a high degree of complexity, and give answers to critical business questions.

The performance metric reported by TPC-H is called the TPC-H Composite Query-per-Hour Performance Metric (QphH@Size), and reflects multiple aspects of the capability of the system to process queries. These aspects include the selected database size against which the queries are executed, the query processing power when queries are submitted by a single stream, and the query throughput when queries are submitted by multiple concurrent users. The TPC-H Price/Performance metric is expressed as $/QphH@Size.

Just in case you want to incorporate the TPC-H benchmark into your NoSQL solution.

I don’t recall any literature on benchmarks for semantic integration solutions.

At least in the sense of either the speed of semantic integration based on some test of semantic equivalence or the range of semantic equivalents handled by a particular engine.

You?

Imperative and Declarative Hadoop: TPC-H in Pig and Hive

Wednesday, February 13th, 2013

Imperative and Declarative Hadoop: TPC-H in Pig and Hive by Russell Jurney.

From the post:

According to the Transaction Processing Council, TPC-H is:

The TPC Benchmark™H (TPC-H) is a decision support benchmark. It consists of a suite of business oriented ad-hoc queries and concurrent data modifications. The queries and the data populating the database have been chosen to have broad industry-wide relevance. This benchmark illustrates decision support systems that examine large volumes of data, execute queries with a high degree of complexity, and give answers to critical business questions.

TPC-H was implemented for Hive in HIVE-600 and for Pig in PIG-2397 by Hortonworks intern Jie Li. In going over this work, I was struck by how it outlined differences between Pig and SQL.

There seems to be tendency for simple SQL to provide greater clarity than Pig. At some point as the TPC-H queries become more demanding, complex SQL seems to have less clarity than the comparable Pig. Lets take a look.
(emphasis in original)

A refresher in the lesson that what solution you need, in this case Hive or PIg, depends upon your requirements.

Use either one blindly at the risk of poor performance or failing to meet other requirements.