Archive for the ‘HyperDex’ Category

HyperDex 1.0 Release

Tuesday, December 10th, 2013

HyperDex 1.0 Release

From the webpage:

We are proud to announce HyperDex 1.0.0. With this official release, we pass the 1.0 development milestone. Key features of this release are:

  • High Performance: HyperDex is fast. It outperforms MongoDB and Cassandra on industry-standard benchmarks by a factor of 2X or more.
  • Advanced Functionality: With the Warp add-on, HyperDex offers multi-key transactions that span multiple objects with ACID guarantees.
  • Strong Consistency: HyperDex ensures that every GET returns the result of the latest PUT.
  • Fault Tolerance: HyperDex automatically replicates data to tolerate a configurable number of failures.

  • Scalable: HyperDex automatically redistributes data to make use of new resources as you add more nodes to your cluster.

HyperDex runs on 64-bit Linux (Ubuntu, Debian, Fedora, Centos) and OS X. Binary packages for Debian 7, Ubuntu 12.04-13.10, Fedora 18-20, and CentOS 6 are available from the Downloads page[1], as well as source tarballs for other Linux platforms.

This release provides bindings for C, C++, Python, Java, Ruby, and Go.

If that sounds good to you, drop by the Get HyperDex page.

See also: HyperDex Reference Manual v1.0.dev by Robert Escriva, Bernard Wong, and Emin Gün Sirer.

For the real story, see Papers and read HyperDex: A Distributed, Searchable Key-Value Store by Robert Escriva, Bernard Wong and Emin Gün Sirer.

The multidimensional aspects of HyperDex resemble recent efforts to move beyond surface tokens, otherwise known as words.

HyperDex 1.0RC5

Wednesday, November 20th, 2013

HyperDex 1.0RC5 by Robert Escriva.

From the post:

We are proud to announce HyperDex 1.0.rc5, the next generation NoSQL data store that provides ACID transactions, fault-tolerance, and high-performance. This new release has a number of exciting features:

  • Improved cluster management. The cluster will automatically grow as new nodes are added.
  • Backup support. Take backups of the coordinator and daemons in a consistent state and be able to restore the cluster to the point when the backup was taken.
  • An admin library which exposes performance counters for tracking cluster-wide statistics relating to HyperDex
  • Support for HyperLevelDB. This is the first HyperDex release to use HyperLevelDB, which brings higher performance than Google’s LevelDB.
  • Secondary indices. Secondary indices improve the speed of search without the overhead of creating a subspace for the indexed attributes.
  • New atomic operations. Most key-based operations now have conditional atomic equivalents.
  • Improved coordinator stability. This release introduces an improved coordinator that fixes a few stability problems reported by users.

Binary packages for Debian 7, Ubuntu 12.04-13.10, Fedora 18-19, and CentOS 6 are available on the HyperDex Download page, as well as source tarballs for other Linux platforms.

BTW, HyperDex has a cool logo:

HyperDex

Good logos are like good book covers, they catch the eye of potential customers.

A book sale starts when a customer pick a book up, hence the need for a good cover.

What sort of cover does your favorite semantic application have?

Warp: Multi-Key Transactions for Key-Value Stores

Saturday, May 18th, 2013

Warp: Multi-Key Transactions for Key-Value Stores by Robert Escriva, Bernard Wong and Emin Gün Sirer†.

Abstract:

Implementing ACID transactions has been a longstanding challenge for NoSQL systems. Because these systems are based on a sharded architecture, transactions necessarily require coordination across multiple servers. Past work in this space has relied either on heavyweight protocols such as Paxos or clock synchronization for this coordination.

This paper presents a novel protocol for coordinating distributed transactions with ACID semantics on top of a sharded data store. Called linear transactions, this protocol achieves scalability by distributing the coordination task to only those servers that hold relevant data for each transaction. It achieves high performance by serializing only those transactions whose concurrent execution could potentially yield a violation of ACID semantics. Finally, it naturally integrates chain-replication and can thus tolerate faults of both clients and servers. We have fully implemented linear transactions in a commercially available data store. Experiments show that the throughput of this system achieves 1-9× more throughput than MongoDB, Cassandra and HyperDex on the Yahoo! Cloud Serving Benchmark, even though none of the latter systems provide transactional guarantees.

Warp looks wicked cool!

Of particular interest is the non-ordering of transactions that have no impact on other transactions. That alone would be interesting for a topic map merging situation.

For more details, see the Warp page, or

Download Warp

Warp Tutorial

Warp Performance Benchmarks

I first saw this at High Scalability.

Getting Started With Hyperdex

Saturday, August 11th, 2012

Getting Started With Hyperdex by Ṣeyi Ogunyẹ́mi.

From the post:

Alright, let’s start this off with a fitting soundtrack just because we can. Open it up in a tab and come back?

Greetings, valiant adventurer!

So, I heard you care about data. You aren’t storing your precious data in anything that acknowledges PUT requests before being certain it’ll be able to return it to you? Well then, you’ve come to the right place.

Okay, I’m clearly excited, but with good reason. Some time in the past few months, I ran into a paper; “HyperDex: A Distributed, Searchable Key-Value Store”1 from a team at Cornell. By now the typical reaction to NoSQL news tends to be that your eyes glaze over and you start mouthing “…is Web-Scale™”, but this isn’t “yet another NoSQL database”. So, I’ve finally gotten round to writing this piece in hopes of sharing it with others.

Before plunging into the deep end, it’s probably a good idea to discuss why I’ve found HyperDex to be particularly exciting. For reasons that will probably be in a different blog post, I’ve been researching the design of a distributed key/value store with support for strong consistency (for the morbidly curious, it’s connected to Ampify). You must realise that the state-of-the-art distributed key/value stores such as Dynamo (and it’s open-source clone, Riak) tend to aim for eventual consistency.

If you aren’t already experimenting with Hyperdex you may well be after reading this post.

Hyperdex: Documentation

Thursday, April 19th, 2012

Posting on the Hyperdex documentation separately from its latest release. It may be of more lasting interest.

Current Documentation:

Hyperdex Documentation (web based)

Hyperdex Documentation (PDF)

Mailing lists:

hyperdex-announce

hyperdex-discuss

Other:

Hyperdex: A New Era in High Performance Data Stores for the Cloud (presentation, 13 April 2012)

Hyperdex Tutorial

Hyperdex (homepage)

Hyperdex: A Searchable Distributed Key-Value Store (New Release)

Thursday, April 19th, 2012

Hyperdex: A Searchable Distributed Key-Value Store (New Release)

From the homepage:

2012-04-16: NEW RELEASE! HyperDex now supports lists, sets, and maps natively, with atomic operations on each of these structures. This enables HyperDex to be used in ever-more demanding applications that make use of these rich datastructures.

HyperDex: A Distributed, Searchable Key-Value Store for Cloud Computing

Thursday, February 23rd, 2012

HyperDex: A Distributed, Searchable Key-Value Store for Cloud Computing by Robert Escrivay, Bernard Wongz and Emin Güun Sirery.

Abstract:

Distributed key-value stores are now a standard component of high-performance web services and cloud computing applications. While key-value stores offer significant performance and scalability advantages compared to traditional databases, they achieve these properties through a restricted API that limits object retrieval—an object can only be retrieved by the (primary and only) key under which it was inserted. This paper presents HyperDex, a novel distributed key-value store that provides a unique search primitive that enables queries on secondary attributes. The key insight behind HyperDex is the concept of hyperspace hashing in which objects with multiple attributes are mapped into a multidimensional hyperspace. This mapping leads to efficient implementations not only for retrieval by primary key, but also for partially-specified secondary attribute searches and range queries. A novel chaining protocol enables the system to provide strong consistency guarantees while supporting replication. An evaluation of the full system shows that HyperDex is orders of magnitude faster than Cassandra and MongoDB for finding partially specified objects. Additionally, HyperDex achieves high performance for simple get/put operations compared to current state-of-the-art key-value stores, with stronger fault tolerance and comparable scalability properties.

This paper merited a separate posting from the software.

Among many interesting points was the following one from the introduction:

A naive Euclidean space construction, however, can suffer from the “curse of dimensionality,” as the space exhibits an exponential increase in volume with each additional secondary attribute [8]. For objects with many attributes, the resulting Euclidean space would be large, and consequently, sparse. Nodes would then be responsible for large regions in the hyperspace, which would increase the number of nodes whose regions intersect search hyperplanes and thus limit the effectiveness of the basic approach. HyperDex addresses this problem by introducing an efficient and lightweight mechanism that partitions the data into smaller, limited-size sub-spaces, where each subspace covers a subset of object attributes in a lower dimensional hyperspace. Thus, by folding the hyperspace back into a lower number of dimensions, HyperDex can ensure higher node selectivity during searches.

Something keeps nagging at me about the use of the term Euclidean space. Since a Euclidean space is a metric space, I “get” how they can partition metric data into smaller sub-spaces.

Names don’t exist in metric spaces but sort orders and frequencies are known well enough to approximate such a solution. Or are they? I assume for more common languages that is the case but that is likely a poor assumption on my part.

What of other non-metric space values? On what basis would they be partitioned?

HyperDex: A Searchable Distributed Key-Value Store

Thursday, February 23rd, 2012

HyperDex: A Searchable Distributed Key-Value Store

From the webpage:

HyperDex is a distributed, searchable key-value store. HyperDex provides a unique search primitive which enables searches over stored values. By design, HyperDex retains the performance of traditional key-value stores while enabling support for the search operation.

The key features of HyperDex are:

  • Fast HyperDex has lower latency and higher throughput than most other key-value stores.
  • Searchable HyperDex enables lookups of non-primary data attributes. Such searches are implemented efficiently and contact a small number of servers.
  • Scalable HyperDex scales as more machines are added to the system.
  • Consistent The value you GET is always the latest value you PUT. Not just "eventually," but immediately and always.
  • Fault tolerant HyperDex handles failures. Data is automatically
    replicated on multiple machines so that failures do not cause data loss.

Source code is available subject to this license.