Archive for the ‘Shark’ Category

Analytics and Machine Learning at Scale [Aug. 29-30]

Tuesday, August 27th, 2013

AMP Camp Three – Analytics and Machine Learning at Scale

From the webpage:

AMP Camp Three – Analytics and Machine Learning at Scale will be held in Berkeley California, August 29-30, 2013. AMP Camp 3 attendees and online viewers will learn to solve big data problems using components of the Berkeley Data Analytics Stack (BDAS) and cutting edge machine learning algorithms.

Live streaming!

Sessions will cover (among other things): Mesos, Spark, Shark, Spark Streaming, BlinkDB, MLbase, Tachyon and GraphX.

Talk about a jolt before the weekend!

Shark (Hive on Spark)

Monday, November 26th, 2012

Shark (Hive on Spark)

From the webpage:

Shark is a large-scale data warehouse system for Spark designed to be compatible with Apache Hive. It can answer Hive QL queries up to 70 times faster than Hive without modification to the existing data nor queries. Shark supports Hive’s query language, metastore, serialization formats, and user-defined functions.

We released Shark 0.2 on Oct 15, 2012. The new version is much more stable and also features significant performance improvements.

Getting Started

See our documentation on Github to get started. It takes around 5 mins to set up Shark on a single node for a quick spin, and about 20 mins on an Amazon EC2 cluster.

Fast Execution Engine

Shark is built on top of Spark, a data-parallel execution engine that is fast and fault-tolerant. Even if data are on disk, Shark can be noticeably faster than Hive because of the fast execution engine. It avoids the high task launching overhead of Hadoop MapReduce and does not require materializing intermediate data between stages on disk. Thanks to this fast engine, Shark can answer queries in sub-second latency.

They say that imitation is the sincerest form of flattery.

In software, do claims of compatibility with your software mean the same thing?

It isn’t possible to know which database solutions will be around in five years but the rapid emergence of alternative solutions certainly is exciting!