Archive for the ‘Key-Value Stores’ Category

No Query Language Needed: Using Python with an Ordered Key-Value Store

Friday, October 10th, 2014

No Query Language Needed: Using Python with an Ordered Key-Value Store by Stephen Pimentel.

From the post:

FoundationDB is a complex and powerful database, designed to handle sharding, replication, network hiccups, and server failures gracefully and automatically. However, when we designed our Python API, we wanted most of that complexity to be hidden from the developer. By utilizing familiar features- such as generators, itertools, and comprehensions-we tried to make FoundationDB’s API as easy to us as a Python dictionary.

In the video below, I show how FoundationDB lets you query data directly using Python language features, rather than a separate query language.

Most applications have back-end data stores that developers need to query. This talk presents an approach to storing and querying data that directly employs Python language features. Using the Key-Value Store, we can make our data persistent with an interface similar to a Python dictionary. Python then gives us a number of tools “out of the box” that we can use to form queries:

  • generators for memory-efficient data retrieval;
  • itertools to filter and group data;
  • comprehensions to assemble the query results.

Taken together, these features give us a query capability using straight Python. The talk walks through a number of example queries using the Enron email dataset. For code and the details.

More motivation to take a look at FoundationDB!

I do wonder about the “no query language needed.” Users, despite their poor results, appear to be committed to querying and query languages.

Whether it is the illusion of “empowerment” of users, the current inability to measure the cost of ineffectual searching, or acceptance of poor search results, search and search operators continue to be the preferred means of interaction. Plan accordingly.

I first saw this in a tweet by Hari Kishan.

FoundationDB: Developer Recipes

Thursday, April 24th, 2014

FoundationDB: Developer Recipes

From the webpage:

Learn how to build new data models, indexes, and more on top of the FoundationDB key-value store API.

I was musing the other day about how to denormalize a data structure for indexing.

This is the reverse of that process but still should be instructive.

Graphistas should note that FoundationDB also implements the Blueprints API (blueprints-foundationdb-graph).

FoundationDB Developer Guide & API Reference

Thursday, January 16th, 2014

FoundationDB Developer Guide & API Reference

From the webpage:

Foundation’s scalability and performance make it an ideal back end for supporting the operation of critical applications. FoundationDB provides a simple data model coupled with powerful transactional integrity. This document gives an overview of application development using FoundationDB, including use of the API, working with transactions, and performance considerations.

When I saw a tweet from FoundationDB that read:

More into theory or practice? Either way, check out the FoundationDB Developer Guide & API Reference

I just had to go look! 😉


Under the Hood: [of RocksDB]

Sunday, November 24th, 2013

Under the Hood: Building and open-sourcing RocksDB by Dhruba Borthakur.

From the post:

Every time one of the 1.2 billion people who use Facebook visits the site, they see a completely unique, dynamically generated home page. There are several different applications powering this experience–and others across the site–that require global, real-time data fetching.

Storing and accessing hundreds of petabytes of data is a huge challenge, and we’re constantly improving and overhauling our tools to make this as fast and efficient as possible. Today, we are open-sourcing RocksDB, an embeddable, persistent key-value store for fast storage that we built and use here at Facebook.

Why build an embedded database?

Applications traditionally access their data via remote procedure calls over a network connection, but that can be slow–especially when we need to power user-facing products in real time. With the advent of flash storage, we are starting to see newer applications that can access data quickly by managing their own dataset on flash instead of accessing data over a network. These new applications are using what we call an embedded database.

There are several reasons for choosing an embedded database. When database requests are frequently served from memory or from very fast flash storage, network latency can slow the query response time. Accessing the network within a data center can take about 50 microseconds, as can fast-flash access latency. This means that accessing data over a network could potentially be twice as slow as an application accessing data locally.

Secondly, we are starting to see servers with an increasing number of cores and with storage-IOPS reaching millions of requests per second. Lock contention and a high number of context switches in traditional database software prevents it from being able to saturate the storage-IOPS. We’re finding we need new database software that is flexible enough to be customized for many of these emerging hardware trends.

Like most of you, I don’t have 1.2 billion people visiting my site. 😉

However, understanding today’s “high-end” solutions will prepare you for tomorrow’s “middle-tier” solution and day after tomorrow’s desktop solution.

A high level overview of RocksDB.

Other resources to consider:

RocksDB Facebook page.

RocksDB on Github.

Update: Igor Canadi has posted to the Facebook page a proposal to add the concept of ColumnFamilies to RocksDB. Comments? (Direct comments on that proposal to the Facebook page for RocksDB.)


Friday, November 15th, 2013

rocksdb by Facebook.

Just in case you can’t wait for OhmDB to appear, Facebook has open sourced rocksdb.

From the readme file:

rocksdb: A persistent key-value store for flash storage
Authors: * The Facebook Database Engineering Team
         * Build on earlier work on leveldb by Sanjay Ghemawat
           ( and Jeff Dean (

This code is a library that forms the core building block for a fast
key value server, especially suited for storing data on flash drives.
It has a Log-Structured-Merge-Database (LSM) design with flexible tradeoffs
between Write-Amplification-Factor(WAF), Read-Amplification-Factor (RAF)
and Space-Amplification-Factor(SAF). It has multi-threaded compactions,
making it specially suitable for storing multiple terabytes of data in a
single database.

The core of this code has been derived from open-source leveldb.

The code under this directory implements a system for maintaining a
persistent key/value store.

See doc/index.html for more explanation.
See doc/impl.html for a brief overview of the implementation.

The public interface is in include/*.  Callers should not include or
rely on the details of any other header files in this package.  Those
internal APIs may be changed without warning.

Something to keep in mind to use with that multi-terabyte drive you are eyeing as a present to yourself. 😉


Thursday, September 19th, 2013


From the webpage:

Sparkey is an extremely simple persistent key-value store. You could think of it as a read-only hashtable on disk and you wouldn’t be far off. It is designed and optimized for some server side usecases at Spotify but it is written to be completely generic and makes no assumptions about what kind of data is stored.

Some key characteristics:

  • Supports data sizes up to 2^63 – 1 bytes.
  • Supports iteration, get, put, delete
  • Optimized for bulk writes.
  • Immutable hash table.
  • Any amount of concurrent independent readers.
  • Only allows one writer at a time per storage unit.
  • Cross platform storage file.
  • Low overhead per entry.
  • Constant read startup cost
  • Low number of disk seeks per read
  • Support for block level compression.
  • Data agnostic, it just maps byte arrays to byte arrays.

What it’s not:

  • It’s not a distributed key value store – it’s just a hash table on disk.
  • It’s not a compacted data store, but that can be implemented on top of it, if needed.
  • It’s not robust against data corruption.

The usecase we have for it at Spotify is serving data that rarely gets updated to users or other services. The fast and efficient bulk writes makes it feasible to periodically rebuild the data, and the fast random access reads makes it suitable for high throughput low latency services. For some services we have been able to saturate network interfaces while keeping cpu usage really low.

If you are looking for a very high-performance key-value store with little to no frills, your search may be over.

Originating with Spotify and being able to saturate network interfaces bodes well for those needing pure performance.

I first saw this in Nat Torkington’s Four short links: 10 September 2013.

Warp: Multi-Key Transactions for Key-Value Stores

Saturday, May 18th, 2013

Warp: Multi-Key Transactions for Key-Value Stores by Robert Escriva, Bernard Wong and Emin Gün Sirer†.


Implementing ACID transactions has been a longstanding challenge for NoSQL systems. Because these systems are based on a sharded architecture, transactions necessarily require coordination across multiple servers. Past work in this space has relied either on heavyweight protocols such as Paxos or clock synchronization for this coordination.

This paper presents a novel protocol for coordinating distributed transactions with ACID semantics on top of a sharded data store. Called linear transactions, this protocol achieves scalability by distributing the coordination task to only those servers that hold relevant data for each transaction. It achieves high performance by serializing only those transactions whose concurrent execution could potentially yield a violation of ACID semantics. Finally, it naturally integrates chain-replication and can thus tolerate faults of both clients and servers. We have fully implemented linear transactions in a commercially available data store. Experiments show that the throughput of this system achieves 1-9× more throughput than MongoDB, Cassandra and HyperDex on the Yahoo! Cloud Serving Benchmark, even though none of the latter systems provide transactional guarantees.

Warp looks wicked cool!

Of particular interest is the non-ordering of transactions that have no impact on other transactions. That alone would be interesting for a topic map merging situation.

For more details, see the Warp page, or

Download Warp

Warp Tutorial

Warp Performance Benchmarks

I first saw this at High Scalability.


Thursday, March 28th, 2013


From the webpage:

Elephant is an S3-backed key-value store with querying powered by Elastic Search. Your data is persisted on S3 as simple JSON documents, but you can instantly query it over HTTP.

Suddenly, your data becomes as durable as S3, as portable as JSON, and as queryable as HTTP. Enjoy!

i don’t recall seeing Elephant on the Database Landscape Map – February 2013. Do you?

Every database is thought, at least by its authors, to be different from all the others.

What dimensions would be the most useful ones for distinction/comparison?


I first saw this in Nat Torkington’s Four short links: 27 March 2013.

Database Landscape Map – February 2013

Wednesday, March 27th, 2013

Database Landscape Map – February 2013 by 451 Research.

Database map

A truly awesome map of available databases.

Originated from Neither fish nor fowl: the rise of multi-model databases by Matthew Aslett.

Matthew writes:

One of the most complicated aspects of putting together our database landscape map was dealing with the growing number of (particularly NoSQL) databases that refuse to be pigeon-holed in any of the primary databases categories.

I have begun to refer to these as “multi-model databases” in recognition of the fact that they are able to take on the characteristics of multiple databases. In truth though there are probably two different groups of products that could be considered “multi-model”:

I think I understand the grouping from the key to the map but the ordering within groups, if meaningful, escapes me.

I am sure you will recognize most of the names but equally sure there will be some you can’t quite describe.


G-Store: [Multi key Access]

Friday, November 2nd, 2012

G-Store: A Scalable Data Store for Transactional Multi key Access in the Cloud by Sudipto Das, Divyakant Agrawal and Amr El Abbadi.


Cloud computing has emerged as a preferred platform for deploying scalable web-applications. With the growing scale of these applications and the data associated with them, scalable data management systems form a crucial part of the cloud infrastructure. Key-Value stores . such as Bigtable, PNUTS, Dynamo, and their open source analogues. have been the preferred data stores for applications in the cloud. In these systems, data is represented as Key-Value pairs, and atomic access is provided only at the granularity of single keys. While these properties work well for current applications, they are insufficient for the next generation web applications . such as online gaming, social networks, collaborative editing, and many more . which emphasize collaboration. Since collaboration by definition requires consistent access to groups of keys, scalable and consistent multi key access is critical for such applications. We propose the Key Group abstraction that defines a relationship between a group of keys and is the granule for on-demand transactional access. This abstraction allows the Key Grouping protocol to collocate control for the keys in the group to allow efficient access to the group of keys. Using the Key Grouping protocol, we design and implement G-Store which uses a key-value store as an underlying substrate to provide efficient, scalable, and transactional multi key access. Our implementation using a standard key-value store and experiments using a cluster of commodity machines show that G-Store preserves the desired properties of key-value stores, while providing multi key access functionality at a very low overhead.

I encountered this while researching the intrusive keys found in: KitaroDB [intrusive-keyed database]

The notion of a key group that consists of multiple keys, assigned to a single node, seems similar to me to the collection of item identifiers in the TMDM (13250-2) post merging.

I haven’t gotten into the details but supporting collections of item identifiers on top of scalable key-value stores is an interesting prospect.

I created a category for G-Store but had to add “(Multikey)” because in looking for more details on it, I encountered another G-Store that you will be interested in.

HyperDex: A Distributed, Searchable Key-Value Store for Cloud Computing

Thursday, February 23rd, 2012

HyperDex: A Distributed, Searchable Key-Value Store for Cloud Computing by Robert Escrivay, Bernard Wongz and Emin Güun Sirery.


Distributed key-value stores are now a standard component of high-performance web services and cloud computing applications. While key-value stores offer significant performance and scalability advantages compared to traditional databases, they achieve these properties through a restricted API that limits object retrieval—an object can only be retrieved by the (primary and only) key under which it was inserted. This paper presents HyperDex, a novel distributed key-value store that provides a unique search primitive that enables queries on secondary attributes. The key insight behind HyperDex is the concept of hyperspace hashing in which objects with multiple attributes are mapped into a multidimensional hyperspace. This mapping leads to efficient implementations not only for retrieval by primary key, but also for partially-specified secondary attribute searches and range queries. A novel chaining protocol enables the system to provide strong consistency guarantees while supporting replication. An evaluation of the full system shows that HyperDex is orders of magnitude faster than Cassandra and MongoDB for finding partially specified objects. Additionally, HyperDex achieves high performance for simple get/put operations compared to current state-of-the-art key-value stores, with stronger fault tolerance and comparable scalability properties.

This paper merited a separate posting from the software.

Among many interesting points was the following one from the introduction:

A naive Euclidean space construction, however, can suffer from the “curse of dimensionality,” as the space exhibits an exponential increase in volume with each additional secondary attribute [8]. For objects with many attributes, the resulting Euclidean space would be large, and consequently, sparse. Nodes would then be responsible for large regions in the hyperspace, which would increase the number of nodes whose regions intersect search hyperplanes and thus limit the effectiveness of the basic approach. HyperDex addresses this problem by introducing an efficient and lightweight mechanism that partitions the data into smaller, limited-size sub-spaces, where each subspace covers a subset of object attributes in a lower dimensional hyperspace. Thus, by folding the hyperspace back into a lower number of dimensions, HyperDex can ensure higher node selectivity during searches.

Something keeps nagging at me about the use of the term Euclidean space. Since a Euclidean space is a metric space, I “get” how they can partition metric data into smaller sub-spaces.

Names don’t exist in metric spaces but sort orders and frequencies are known well enough to approximate such a solution. Or are they? I assume for more common languages that is the case but that is likely a poor assumption on my part.

What of other non-metric space values? On what basis would they be partitioned?

HyperDex: A Searchable Distributed Key-Value Store

Thursday, February 23rd, 2012

HyperDex: A Searchable Distributed Key-Value Store

From the webpage:

HyperDex is a distributed, searchable key-value store. HyperDex provides a unique search primitive which enables searches over stored values. By design, HyperDex retains the performance of traditional key-value stores while enabling support for the search operation.

The key features of HyperDex are:

  • Fast HyperDex has lower latency and higher throughput than most other key-value stores.
  • Searchable HyperDex enables lookups of non-primary data attributes. Such searches are implemented efficiently and contact a small number of servers.
  • Scalable HyperDex scales as more machines are added to the system.
  • Consistent The value you GET is always the latest value you PUT. Not just "eventually," but immediately and always.
  • Fault tolerant HyperDex handles failures. Data is automatically
    replicated on multiple machines so that failures do not cause data loss.

Source code is available subject to this license.


Monday, January 23rd, 2012


From the webpage:

Scalaris is a scalable, transactional, distributed key-value store. It can be used for building scalable Web 2.0 services.

Scalaris uses a structured overlay with a non-blocking Paxos commit protocol for transaction processing with strong consistency over replicas. Scalaris is implemented in Erlang.

Following I found:

Our work is similar to Amazon’s SimpleDB, but additionally supports full ACID properties. Dynamo, in contrast, restricts itself to eventual consistency only. As a test case, we chose Wikipedia, the free encyclopedia, that anyone can edit. Our implementation serves approx. 2,500 transactions per second with just 16 CPUs, which is better than the public Wikipedia.

Be forewarned that the documentation is in Google Docs, which does not like Firefox on Ubuntu.

Sigh, back to browser wars, again? Says it will work with Google Chrome.


Thursday, November 3rd, 2011


From the features page:

RethinkDB is a persistent, industrial-strength key-value store with full support for the Memcached protocol.

Powerful technology

  • Ten times faster on solid-state
  • Linear scaling across cores
  • Fine-grained durability control
  • Instantaneous recovery on power failure

Supported core features

  • Point queries
  • Atomic increment/decrement
  • Arbitrary atomic operations
  • Append/prepend operations
  • Values up to 10MB in size
  • Pipelining support
  • Row expiration support
  • Multi-GET support

I particularly liked this line:

Can I use RethinkDB even if I don’t have solid-state drives in my infrastructure?

While RethinkDB performs best on dedicated commodity hardware that has a multicore processor and is backed by solid-state storage, it will still deliver a performance advantage both on rotational drives and in the cloud. (emphasis added to the answer)

Don’t worry your “rotational drives” and “cloud” account have not suddenly become obsolete. The skill you need to acquire before the next upgrade cycle is evaluating performance claims with your processes and data.

It doesn’t matter that all the UN documents can be retrieved in under sub-millisecond time, translated and served with a hot Danish if you don’t use the same format, have no need for translation and are more a custard tart fan. Vendor performance figures may attract your interest but your decision making should be driven by performance figures that represent your environment.

Build into the acquisition budget funding for your staff to replicate a representative subset of your data and processes for testing with vendor software/hardware. True enough, after the purchase you will probably toss that subset, but remember you will be living with the software purchase for years. And be known as the person who managed the project. Suddenly spending a little more money on making sure your requirements are met doesn’t sound so bad.

AZOrange – Machine Learning for QSAR Modeling

Friday, July 29th, 2011

AZOrange – High performance open source machine learning for QSAR modeling in a graphical programming environment Jonna C Stalring, Lars A Carlsson, Pedro Almeida and Scott Boyer. Journal of Cheminformatics 2011, 3:28doi:10.1186/1758-2946-3-28


Machine learning has a vast range of applications. In particular, advanced machine learning methods are routinely and increasingly used in quantitative structure activity relationship (QSAR) modeling. QSAR data sets often encompass tens of thousands of compounds and the size of proprietary, as well as public data sets, is rapidly growing. Hence, there is a demand for computationally efficient machine learning algorithms, easily available to researchers without extensive machine learning knowledge. In granting the scientific principles of transparency and reproducibility, Open Source solutions are increasingly acknowledged by regulatory authorities. Thus, an Open Source state-of-the-art high performance machine learning platform, interfacing multiple, customized machine learning algorithms for both graphical programming and scripting, to be used for large scale development of QSAR models of regulatory quality, is of great value to the QSAR community.

Project homepage: AZOrange (Ubuntu, I assume it compiles and runs on other *nix platforms. I run Ubuntu so I need to setup another *nix distribution just for test purposes.)

Fully De-Amortized Cuckoo Hashing for Cache-Oblivious Dictionaries and Multimaps

Monday, July 25th, 2011

Fully De-Amortized Cuckoo Hashing for Cache-Oblivious Dictionaries and Multimaps by Michael T. Goodrich, Daniel S. Hirschberg, Michael Mitzenmacher, and Justin Thaler.


A dictionary (or map) is a key-value store that requires all keys be unique, and a multimap is a key-value store that allows for multiple values to be associated with the same key. We design hashing-based indexing schemes for dictionaries and multimaps that achieve worst-case optimal performance for lookups and updates, with a small or negligible probability the data structure will require a rehash operation, depending on whether we are working in the the external-memory (I/O) model or one of the well-known versions of the Random Access Machine (RAM) model. One of the main features of our constructions is that they are fully de-amortized, meaning that their performance bounds hold without one having to tune their constructions with certain performance parameters, such as the constant factors in the exponents of failure probabilities or, in the case of the external-memory model, the size of blocks or cache lines and the size of internal memory (i.e., our external-memory algorithms are cache oblivious). Our solutions are based on a fully de-amortized implementation of cuckoo hashing, which may be of independent interest. This hashing scheme uses two cuckoo hash tables, one “nested” inside the other, with one serving as a primary structure and the other serving as an auxiliary supporting queue/stash structure that is super-sized with respect to traditional auxiliary structures but nevertheless adds negligible storage to our scheme. This auxiliary structure allows the success probability for cuckoo hashing to be very high, which is useful in cryptographic or data-intensive applications.

Would you say that topic maps qualify as “data-intensive applications?”

Or does that depend upon the topic map?

Voldemort V0.9 Released: NIO, Pipelined FSM, Hinted Handoff

Wednesday, July 20th, 2011

Voldemort V0.9 Released: NIO, Pipelined FSM, Hinted Handoff

From Alex Popescu’s myNoSQL, links to commentary on the latest release.

See also:


Sunday, July 10th, 2011


From the website:

What is Flintstone?

A key/value database store using flat files for PHP.

Features include:

  • Memory efficient
  • File locking
  • Caching
  • Gzip compression
  • Easy to use

Since I have covered other key/value store databases I thought I should include this one as well.

Could be useful for quick mockups.

Although I am mindful of a developer who complained to me about a script that was supposed to be too limited for long term production use by the customer. Later found out it met their production requirements quite nicely.

The lesson I drew from that was a need for greater transparency with customers and licensing.

Putting and Getting Data from a Database

Monday, May 2nd, 2011

Putting and Getting Data from a Database

Overview of database structures and and data operations on those structures by Marko A. Rodriguez.

Marko covers primitive, key-value, and document stores, plus graph databases.

There are a number of database structures in addition to those four, although those are certainly popular ones.

I would not put too much stock into claims about one form of technology or another until I saw it in operation with my data.

That software works great with someone else’s data isn’t all that interesting, either for you or your manager.

Key-Key-Value Stores for Efficiently Processing Graph Data in the Cloud

Tuesday, April 26th, 2011

Key-Key-Value Stores for Efficiently Processing Graph Data in the Cloud by Alexander G. Connor, Panos K. Chrysanthis, Alexandros Labrinidis.

More slides from GDM 2011.

Some of the slides don’t present well but you can get the gist of what is being said.

Develops an extension to key-value, key-key-value store, which partitions data into sets of related keys.

Not enough detail. Will watch for the paper.

Square Pegs and Round Holes in the NOSQL World

Friday, April 22nd, 2011

Square Pegs and Round Holes in the NOSQL World

Jim Webber reviews why graph databases (such as Neo4J) are better for storing graphs than Key-Value, Document or relational datastores.

He concludes:

In these kind of situations, choosing a non-graph store for storing graphs is a gamble. You may find that you’ve designed your graph topology far too early in the system lifecycle and lose the ability to evolve the structure and perform business intelligence on your data. That’s why Neo4j is cool – it keeps graph and application concerns separate, and allows you to defer data modelling decisions to more responsible points throughout the lifetime of your application.

You know, we could say the same thing about topic maps, that you don’t have to commit to all modeling decisions up front.

Something to think about.

Hank: A Fast, Open-Source, Batch-Updatable, Distributed Key-Value Store

Wednesday, March 16th, 2011

Hank: A Fast, Open-Source, Batch-Updatable, Distributed Key-Value Store

From the RapLeaf blog:

We’re really excited to announce the open-source debut of a cool piece of Rapleaf’s internal infrastructure, a distributed database project we call Hank.

Our use case is very particular: we have tons of data that needs to get processed, producing a lot of data points for individual people, which then need to be made randomly accessible so they can be served through our API. You can think of it as the “process and publish” pattern.

For the processing component, Hadoop and Cascading were an obvious choice. However, making our results randomly accessible for the API was more challenging. We couldn’t find an existing solution that was fast, scalable, and perhaps most importantly, wouldn’t degrade performance during updates. Our API needs to have lightning-fast responses so that our customers can use it in realtime to personalize their users’ experiences, and it’s just not acceptable for us to have periods where reads contend with writes while we’re updating.


  1. Random reads need to be fast – reliably on the order of a few milliseconds.
  2. Datastores need to scale to terabytes, with keys and values on the order of kilobytes.
  3. We need to be able to push out hundreds of millions of updates a day, but they don’t have to happen in realtime. Most will come from our Hadoop cluster.
  4. Read performance should not suffer while updates are in progress.


  1. During the update process, it doesn’t matter if there is more than one version of our datastores available. Our application is tolerant of this inconsistency.
  2. We have no need for random writes.

If you read updates as merges then the relevance of this posting to topic maps becomes a bit clearer. 😉

Not all topic map systems will have the same requirements and non-requirements.

(This resource pointed out to me by Jack Park.)

Node.js Key-Value Stores – Post

Friday, January 28th, 2011

Node.js Key-Value Stores

A blog post by Alex Popescu on the use of node.js as a key-value store.

  • Alfred: in-process key-value store. You can read more about it here
  • node-dirty: tiny and fast key-value store with append-only disk log

And I’d bet there are more to come. Real question though, is how many will be around in 6-12 months.

I suspect the number of node.js key-value stores will increase and then decrease.

I am not sure I understand the implied problem?

But then I remember when there were > 300+ formats for document conversion.

There are fewer than that now. Well, at least major ones. 😉

Is an upsurge in solutions, experimentation and a winnowing down, until the next explosion of creativity, a bad thing?