Archive for the ‘B-trees’ Category

Exotic Functional Data Structures: Hitchhiker Trees

Sunday, September 18th, 2016


Functional data structures are awesome–they’re the foundation of many functional programming languages, allowing us to express complex logic immutably and efficiently. There is one unfortunate limitation: these data structures must fit on the heap, limiting their lifetime to that of the process. Several years ago, Datomic appeared as the first functional database that addresses these limitations. However, there hasn’t been much activity in the realm of scalable (gigabytes to terabytes) functional data structures.

In this talk, we’ll first review some of the fundamental principles of functional data structures, particularly trees. Next, we’ll review what a B tree is and why it’s better than other trees for storage. Then, we’ll learn about a cool variant of a B tree called a fractal tree, how it can be made functional, and why it has phenomenal performance. Finally, we’ll unify these concepts to understand the Hitchhiker tree, an open-source functionally persistent fractal tree. We’ll also briefly look at an example API for using Hitchhiker trees that allows your application’s state to be stored off-heap, in the spirit of the 2014 paper “Fast Database Restarts at Facebook”.

David Greenberg (profile)

Hitchhiker Trees (GitHub)

Fast Database Restarts at Facebook by Aakash Goel, Bhuwan Chopra, Ciprian Gerea, Dhrúv Mátáni, Josh Metzler, Fahim Ul Haq, Janet Wiener.

You could have searched for all the information I have included, but isn’t it more convenient to have it “already found?”

The Bw-Tree: A B-tree for New Hardware Platforms

Tuesday, April 15th, 2014

The Bw-Tree: A B-tree for New Hardware Platforms by Justin J. Levandoski, David B. Lomet, and, Sudipta Sengupta.


The emergence of new hardware and platforms has led to reconsideration of how data management systems are designed. However, certain basic functions such as key indexed access to records remain essential. While we exploit the common architectural layering of prior systems, we make radically new design decisions about each layer. Our new form of B-tree, called the Bw-tree achieves its very high performance via a latch-free approach that effectively exploits the processor caches of modern multi-core chips. Our storage manager uses a unique form of log structuring that blurs the distinction between a page and a record store and works well with flash storage. This paper describes the architecture and algorithms for the Bw-tree, focusing on the main memory aspects. The paper includes results of our experiments that demonstrate that this fresh approach produces outstanding performance.

With easy availability of multi-core chips, what new algorithms are you going to discover while touring SICP or TAOCP?

Is that going to be an additional incentive to tour one or both of them?

Fractal Tree Indexing Overview

Monday, December 10th, 2012

Fractal Tree Indexing Overview by Martin Farach-Colton.

From the post:

We get a lot of questions about how Fractal Tree indexes work. It’s a write-optimized index with fast queries, but which write-optimized indexing structure is it?

In this ~15 minute video (which uses these slides), I give a quick overview of how they work and what they are good for.

Suggestion: Watch the video along with the slides. (Some of the slides are less than intuitive. Trust me on this one.)

Martin Gardner explaining fractals in SciAm it’s not but it will give you a better appreciation for fractal trees.

BTW, did you know B-Trees are forty years old this year?

Distributed Graph DB Reading Club Returns! (4th April 2012)

Monday, April 2nd, 2012

Distributed Graph DB Reading Club Returns! (4th April 2012)

René Pickhardt writes:

The reading club was quite inactive due to traveling and also a not optimal process for the choice of literature. That is why a new format for the reading club has been discussed and agreed upon.

The new Format means that we have 4 new rules

  1. we will only discuss up to 3 papers in 90 minutes of time. So rough speaking we have 30 minutes per paper but this does not have to be strict.
  2. The decided papers should be read by everyone before the reading club takes place.
  3. For every paper there is one responsible person (moderator) who did read the entire paper before he suggested it as a common reading.
  4. Open questions to the (potential) reading assignments and ideas for reading can and should be discussed on (use the same template as I used for the reading assignments in this blogpost) eg:

Paper download:
Why to read it
topics to discuss / open questions:

For next meeting on April 4th 2 pm CET (in two days) the literature will be:

In case you haven’t noticed, the 4th of April is day after tomorrow!

Time for the reading glasses and strong coffee!

Cache-Oblivious Search Trees Project (Fractal Trees, TokuDB)

Thursday, November 3rd, 2011

Cache-Oblivious Search Trees Project (Fractal Trees, TokuDB)

I watched a very disappointing presentation on Fractal Trees (used by Tokutek in the TokuDB) and so went looking for better resources.

The project is described as:

We implemented a cache-oblivious dynamic search tree as an alternative to the ubiquitious B-tree. We used a binary tree with a “van Emde Boas” layout whose leaves point to intervals in a “packed memory structure”. The search tree supports efficient lookup, as well as efficient amortized insertion and deletion. Efficient implementation of a B-tree requires understanding the cache-line size and page size and is optimized for a specific memory hierarchy. In contrast, a cache-oblivious dynamic search tree contains no machine-dependent variables, performs well on any memory hierarchy, and requires minimal user-level memory management. For random insertion of data, the data structure performs better than the Berkeley DB and a good implementation of B-trees. Another advantage of my data structure is that the packed memory array maintains data in sorted order, allows sequential reads at high speeds, and data insertions and deletions with few data writes on average. In addition, the data structure is easy to implement because he employed memory mapping rather than making the data stored on disk be a single level store.

We also have designed cache-oblivious search trees for which the keys can be very long (imagine a key, such as a DNA sequence, that is larger than main memory), and trees into which data can be quickly inserted.

One essential difference is that the B-Tree supports random I/O and the Fractal Tree converts random I/O into sequential I/O, which operates at near disk bandwidth speeds.

At Tokutek, I would review the paper by Bradley C. Kuszmaul, How TokuDB Fractal Tree™ IndexesWork.

Impressive software for the right situation.

The background literature is interesting. Not sure if directly applicable to topic maps or not.

Stratified B-Tree and Versioned Dictionaries

Monday, July 25th, 2011

Stratified B-Tree and Versioned Dictionaries by Andy Twigg (Acunu). (video)


A classic versioned data structure in storage and computer science is the copy-on-write (CoW) B-tree — it underlies many of today’s file systems and databases, including WAFL, ZFS, Btrfs and more. Unfortunately, it doesn’t inherit the B-tree’s optimality properties; it has poor space utilization, cannot offer fast updates, and relies on random IO to scale. Yet, nothing better has been developed since. We describe the `stratified B-tree’, which beats all known semi-external memory versioned B-trees, including the CoW B-tree. In particular, it is the first versioned dictionary to achieve optimal tradeoffs between space, query and update performance.

I haven’t had time to watch the video but you can find some other resources on stratified B-Trees at Andy’s post All about stratified B-trees.