Archive for the ‘Fractal Trees’ Category

Exotic Functional Data Structures: Hitchhiker Trees

Sunday, September 18th, 2016


Functional data structures are awesome–they’re the foundation of many functional programming languages, allowing us to express complex logic immutably and efficiently. There is one unfortunate limitation: these data structures must fit on the heap, limiting their lifetime to that of the process. Several years ago, Datomic appeared as the first functional database that addresses these limitations. However, there hasn’t been much activity in the realm of scalable (gigabytes to terabytes) functional data structures.

In this talk, we’ll first review some of the fundamental principles of functional data structures, particularly trees. Next, we’ll review what a B tree is and why it’s better than other trees for storage. Then, we’ll learn about a cool variant of a B tree called a fractal tree, how it can be made functional, and why it has phenomenal performance. Finally, we’ll unify these concepts to understand the Hitchhiker tree, an open-source functionally persistent fractal tree. We’ll also briefly look at an example API for using Hitchhiker trees that allows your application’s state to be stored off-heap, in the spirit of the 2014 paper “Fast Database Restarts at Facebook”.

David Greenberg (profile)

Hitchhiker Trees (GitHub)

Fast Database Restarts at Facebook by Aakash Goel, Bhuwan Chopra, Ciprian Gerea, Dhrúv Mátáni, Josh Metzler, Fahim Ul Haq, Janet Wiener.

You could have searched for all the information I have included, but isn’t it more convenient to have it “already found?”

Why Unique Indexes are Bad [Caveat on Fractal Tree(R) Indexes]

Monday, July 15th, 2013

Why Unique Indexes are Bad by Zardosht Kasheff.

From the post:

Before creating a unique index in TokuMX or TokuDB, ask yourself, “does my application really depend on the database enforcing uniqueness of this key?” If the answer is ANYTHING other than yes, do not declare the index to be unique. Why? Because unique indexes may kill your write performance. In this post, I’ll explain why.

Unique indexes are a strange beast: they have no impact on standard databases that use B-Trees, such as MongoDB and MySQL, but may be horribly painful for databases that use write optimized data structures, like TokuMX’s Fractal Tree(R) indexes. How? They essentially drag the Fractal Tree index down to the B-Tree’s level of performance.

When a user declares a unique index, the user tells the database, “please help me and enforce uniqueness on this index.” So, before doing any insertion into a unique index, the database must first verify that the key being inserted does not already exist. If the possible location of the key is not in memory, which may happen if the working set does not fit in memory, then the database MUST perform an I/O to bring into memory the contents of the potential location (be it a leaf node in a tree, or an offset into a memory mapped file), in order to check whether the key exists in that location.


Zardosht closes by recommending if your application does require unique indexes that you consider re-writing it so it doesn’t.


Not a mark against Fractal Tree(R) indexes but certainly a consideration in deciding to adopt technology using them.

Would be nice if this type of information could be passed along as more than sysadmin lore.

Like a plugin for your browser that at your request highlights products or technologies of interest and on mouse-over displays known limitations or bugs.

The sort of things that vendors loath to disclose.

TokuMX: High Performance for MongoDB

Friday, June 21st, 2013

TokuMX: High Performance for MongoDB

From the webpage:

TokuMXTM for MongoDB is here!

Tokutek, whose Fractal Tree® indexing technology has brought dramatic performance and scalability to MySQL and MariaDB users, now brings those same benefits to MongoDB users.

TokuMX is open source performance-enhancing software for MongoDB that make MongoDB more performant in large application with demanding requirements. In addition to replacing B-tree indexing with more modern technology, TokuMX adds transaction support, document-level locking for concurrent writes, and replication.

You have seen the performance specs on Fractal Tree indexing.

Now they are available for MongoDB!

Open Source TokuDB Resources

Saturday, April 27th, 2013

Open Source TokuDB Resources

A quick summary of the Tokutek repositories at Github and pointers to Google groups for discussion of TokuDB.

Wanted: Evaluators to Try MongoDB with Fractal Tree Indexing

Tuesday, March 26th, 2013

Wanted: Evaluators to Try MongoDB with Fractal Tree Indexing by Tim Callaghan.

From the post:

We recently resumed our discussion around bringing Fractal Tree indexes to MongoDB. This effort includes Tokutek’s interview with Jeff Kelly at Strata as well as my two recent tech blogs which describe the compression achieved on a generic MongoDB data set and performance improvements we measured using on our implementation of Sysbench for MongoDB. I have a full line-up of benchmarks and blogs planned for the next few months, as our project continues. Many of these will be deeply technical and written by the Tokutek developers.

We have a group of evaluators running MongoDB with Fractal Tree Indexes, but more feedback is always better. So …

Do you want to participate in the process of bringing high compression and extreme performance gains to MongoDB? We’re looking for MongoDB experts to test our build on your real-world workloads and benchmarks. Evaluator feedback will be used in creating the product road map. Please email me at if interested.

You keep reading about the performance numbers on MongoDB.

Aren’t you curious if those numbers are true for your use case?

Here’s your opportunity to find out!

MongoDB + Fractal Tree Indexes = High Compression

Friday, March 1st, 2013

MongoDB + Fractal Tree Indexes = High Compression by Tim Callaghan.

You may have heard that MapR Technologies broke the MinuteSort Record by sorting 15 billion 100-btye records in 60 seconds. Used 2,103 virtual instances in the Google Compute Engine and each instance had four virtual cores and one virtual disk, totaling 8,412 virtual cores and 2,103 virtual disks. Google Compute Engine, MapR Break MinuteSort Record.

So, the next time you have 8,412 virtual cores and 2,103 virtual disks, you know what is possible, 😉

But if you have less firepower than that, you will need to be clever:

One doesn’t have to look far to see that there is strong interest in MongoDB compression. MongoDB has an open ticket from 2009 titled “Option to Store Data Compressed” with Fix Version/s planned but not scheduled. The ticket has a lot of comments, mostly from MongoDB users explaining their use-cases for the feature. For example, Khalid Salomão notes that “Compression would be very good to reduce storage cost and improve IO performance” and Andy notes that “SSD is getting more and more common for servers. They are very fast. The problems are high costs and low capacity.” There are many more in the ticket.

In prior blogs we’ve written about significant performance advantages when using Fractal Tree Indexes with MongoDB. Compression has always been a key feature of Fractal Tree Indexes. We currently support the LZMA, quicklz, and zlib compression algorithms, and our architecture allows us to easily add more. Our large block size creates another advantage as these algorithms tend to compress large blocks better than small ones.

Given the interest in compression for MongoDB and our capabilities to address this functionality, we decided to do a benchmark to measure the compression achieved by MongoDB + Fractal Tree Indexes using each available compression type. The benchmark loads 51 million documents into a collection and measures the size of all files in the file system (–dbpath).

More benchmarks to follow and you should remember that all benchmarks are just that, benchmarks.

Benchmarks do not represent experience with your data, under your operating load and network conditions, etc.

Investigate software based on the first, purchase software based on the second.

NoSQL is Great, But You Still Need Indexes [MongoDB for example]

Wednesday, February 20th, 2013

NoSQL is Great, But You Still Need Indexes by Martin Farach-Colton.

From the post:

I’ve said it before, and, as is the nature of these things, I’ll almost certainly say it again: your database performance is only as good as your indexes.

That’s the grand thesis, so what does that mean? In any DB system — SQL, NoSQL, NewSQL, PostSQL, … — data gets ingested and organized. And the system answers queries. The pain point for most users is around the speed to answer queries. And the query speed (both latency and throughput, to be exact) depend on how the data is organized. In short: Good Indexes, Fast Queries; Poor Indexes, Slow Queries.

But building indexes is hard work, or at least it has been for the last several decades, because almost all indexing is done with B-trees. That’s true of commercial databases, of MySQL, and of most NoSQL solutions that do indexing. (The ones that don’t do indexing solve a very different problem and probably shouldn’t be confused with databases.)

It’s not true of TokuDB. We build Fractal Tree Indexes, which are much easier to maintain but can still answer queries quickly. So with TokuDB, it’s Fast Indexes, More Indexes, Fast Queries. TokuDB is usually thought of as a storage engine for MySQL and MariaDB. But it’s really a B-tree substitute, so we’re always on the lookout for systems where we can improving the indexing.

Enter MongoDB. MongoDB is beloved because it makes deployment fast. But when you peel away the layers, you get down to a B-tree, with all the performance headaches and workarounds that they necessitate.

That’s the theory, anyway. So we did some testing. We ripped out the part of MongoDB that takes care of secondary indices and plugged in TokuDB. We’ve posted the blogs before, but here they are again, the greatest hits of TokuDB+MongoDB: we show a 10x insertion performance, a 268x query performance, and a 532x (or 53,200% if you prefer) multikey index insertion performance. We also discussed covered indexes vs. clustered Fractal Tree Indexes.

Did somebody declare February 20th to be performance release day?

Did I miss that memo? 😉

Like every geek, I like faster. But, here’s my question:

Have there been any studies on the impact of faster systems on searching and decision making by users?

My assumption is the faster I get a non-responsive result from a search, the sooner I can improve it.

But that’s an assumption on my part.

Is that really true?

Concurrency Improvements in TokuDB v6.6 (Part 2)

Tuesday, February 5th, 2013

Concurrency Improvements in TokuDB v6.6 (Part 2)

From the post:

In Part 1, we showed performance results of some of the work that’s gone in to TokuDB v6.6. In this post, we’ll take a closer look at how this happened, on the engineering side, and how to think about the performance characteristics in the new version.


It’s easiest to think about our concurrency changes in terms of a Fractal Tree® index that has nodes like a B-tree index, and buffers on each node that batch changes for the subtree rooted at that node. We have materials that describe this available here, but we can proceed just knowing that:

  1. To inject data into the tree, you need to store a message in a buffer at the root of the tree. These messages are moved down the tree, so you can find messages in all the internal nodes of the tree (the mechanism that moves them is irrelevant for now).
  2. To read data out of the tree, you need to find a leaf node that contains your key, check the buffers on the path up to the root for messages that affect your query, and apply any such messages to the value in the leaf before using that value to answer your query.

It’s these operations that modify and examine the buffers in the root that were the main reason we used to serialize operations inside a single index.

Just so not everything today is “soft” user stuff. 😉

Interesting avoidance of the root node as an I/O bottleneck.

Sort of thing that gets me to thinking about distributed topic map writing/querying.

Fractal Tree Indexing Overview

Monday, December 10th, 2012

Fractal Tree Indexing Overview by Martin Farach-Colton.

From the post:

We get a lot of questions about how Fractal Tree indexes work. It’s a write-optimized index with fast queries, but which write-optimized indexing structure is it?

In this ~15 minute video (which uses these slides), I give a quick overview of how they work and what they are good for.

Suggestion: Watch the video along with the slides. (Some of the slides are less than intuitive. Trust me on this one.)

Martin Gardner explaining fractals in SciAm it’s not but it will give you a better appreciation for fractal trees.

BTW, did you know B-Trees are forty years old this year?

Best Practices for a Successful TokuDB Evaluation (Webinar)

Thursday, November 29th, 2012

Best Practices for a Successful TokuDB Evaluation by Gerry Narvaja

Date: December 11th
Time: 2 PM EST / 11 AM PST

From the webpage:

In this webinar we will show step by step how to install, configure, and test TokuDB for a typical performance evaluation. We’ll also be flagging potential pitfalls that can ruin the eval results. It will describe the differences between installing from scratch and replacing an existing MySQL / MariaDB installation. It will also review the most common issues that may arise when running TokuDB binaries.

You have seen the TokuDB numbers on their data.

Now you can see what numbers you can get with your data.

MongoDB and Fractal Tree Indexes (Webinar) [13 November 2012]

Wednesday, October 31st, 2012

Webinar: MongoDB and Fractal Tree Indexes by Tim Callaghan.

From the post:

This webinar covers the basics of B-trees and Fractal Tree Indexes, the benchmarks we’ve run so far, and the development road map going forward.

Date: November 13th
Time: 2 PM EST / 11 AM PST

If you aren’t familiar with Fractal Tree Indexes and MongoDB, this is your opportunity to catch up!

Report on XLDB Tutorial on Data Structures and Algorithms

Tuesday, October 16th, 2012

Report on XLDB Tutorial on Data Structures and Algorithms by Michael Bender.

From the post:

The tutorial was organized as follows:

  • Module 0: Tutorial overview and introductions. We describe an observed (but not necessary) tradeoff in ingestion, querying, and freshness in traditional database.
  • Module 1: I/O model and cache-oblivious analysis.
  • Module 2: Write-optimized data structures. We give the optimal trade-off between inserts and point queries. We show how to build data structures that lie on this tradeoff curve.
  • Module 2 continued: Write-optimized data structures perform writes much faster than point queries; this asymmetry affects the design of an ACID compliant database.
  • Module 3: Case study – TokuFS. How to design and build a write-optimized file systems.
  • Module 4: Page-replacement algorithms. We give relevant theorems on the performance of page-replacement strategies such as LRU.
  • Module 5: Index design, including covering indexes.
  • Module 6: Log-structured merge trees and fractional cascading.
  • Module 7: Bloom filters.

These algorithms and data structures are used both in NoSQL implementations such as MongoDB, HBase and in SQL-oriented implementations such as MySQL and TokuDB.

The slides are available here.

A tutorial offered by Michael and Bradley C. Kuszmaul at the 6th XLDB conference.

If you are committed to defending your current implementation choices against all comers, don’t bother with the slides.

If you want a peek at one future path in data structures, get the slides. You won’t be disappointed.

Forbes: “Tokutek Makes Big Data Dance”

Saturday, October 6th, 2012

Forbes: “Tokutek Makes Big Data Dance” by Lawrence Schwartz.

From the post:

Recently, our CEO, John Partridge had a chance to talk about novel database technologies for “Big Data” with Peter Cohan of Forbes.

According to the article, “Fractal Tree indexing is helping organizations analyze big data more efficiently due to its ability to improve database efficiency thanks to faster ‘database insertion speed, quicker input/output performance, operational agility, and data compression.’” As a start-up based on “the first algorithm-based breakthrough in the database world in 40 years,” Toktuetek is following in the footsteps of firms such as Google and RSA, which also relied on novel algortithm advances as core to their technology.

To read the full article, and to see how Tokutek is helping companies tackle big data, see here.

I would ignore Peter Cohan’s mistakes about the nature of credit card processing. You don’t wait for the “ok” on your account balance.

Remember What if all transactions required strict global consistency? by Matthew Aslett of the 451 Group? Eventual consistency works right now.

I would have picked “hot schema” changes as a feature to highlight but that might not play as well with a business audience.

Webinar: Introduction to TokuDB v6.5 (Oct. 10, 2012)

Saturday, October 6th, 2012

Webinar: Introduction to TokuDB v6.5

From the post:

TokuDB® is a proven solution that scales MySQL® and MariaDB® from GBs to TBs with unmatched insert and query speed, compression, replication performance and online schema flexibility. Tokutek’s recently launched TokuDB v6.5 delivers all of these features and more, not just for HDDs, but also for flash memory.

Date: October 10th
Time: 2 PM EST / 11 AM PST

TokuDB v6.5:

  • Stores 10x More Data – TokuDB delivers 10x compression without any performance degradation. Users can therefore take advantage of much greater amounts of available space without paying more for additional storage.
  • Delivers High Insertion Speed – TokuDB Fractal Tree® indexes continue to change the game with huge insertion rates and greater scalability. Our latest release delivers an order of magnitude faster insertion performance than the competition, ideal for applications that must simultaneously query and update large volumes of rapidly arriving data (e.g., clickstream analytics).
  • Allows Hot Schema Changes — Hot column addition/deletion/rename/resize provides the ability to add/drop/change a column to a database without taking the database offline, enabling database administrators to redefine or add new fields with no downtime.
  • Extends Wear Life for Flash– TokuDB’s proprietary Fractal Tree indexing writes fewer, larger blocks which reduces overall wear, and more efficiently utilizes the FTL (Flash Translation Layer). This extends the life of flash memory by an order of magnitude for many applications.

This webinar covers TokuDB features, latest performance results, and typical use cases.

You have seen the posts about fractal indexing! Now see the demos!

Three Ways that Fractal Tree Indexes Improve SSD for MySQL

Friday, September 28th, 2012

Three Ways that Fractal Tree Indexes Improve SSD for MySQL

The three advantages:

  • Advantage 1: Index maintenence performance.
  • Advantage 2: Compression.
  • Advantage 3: Reduced wear.

See the post for details and the impressive numbers one expects from Fractal tree indexes.

Looking for MongoDB users to test Fractal Tree Indexing

Friday, September 14th, 2012

Looking for MongoDB users to test Fractal Tree Indexing by Tim Callaghan.

In my three previous blogs I wrote about our implementation of Fractal Tree Indexes on MongoDB, showing a 10x insertion performance increase, a 268x query performance increase, and a comparison of covered indexes and clustered indexes. The benchmarks show the difference that rich and efficient indexing can make to your MongoDB workload.

It’s one thing for us to benchmark MongoDB + TokuDB and another to measure real world performance. If you are looking for a way to improve the performance or scalability of your MongoDB deployment, we can help and we’d like to hear from you. We have a preview build available for MongoDB v2.2 that you can run with your existing data folder, drop/add Fractal Tree Indexes, and measure the performance differences. Please email me at if interested.

Here is your chance to try these speed improvements out on your data!

MongoDB Index Shootout: Covered Indexes vs. Clustered Fractal Tree Indexes

Friday, September 7th, 2012

MongoDB Index Shootout: Covered Indexes vs. Clustered Fractal Tree Indexes by Tim Callaghan.

From the post:

In my two previous blogs I wrote about our implementation of Fractal Tree Indexes on MongoDB, showing a 10x insertion performance increase and a 268x query performance increase. MongoDB’s covered indexes can provide some performance benefits over a regular MongoDB index, as they reduce the amount of IO required to satisfy certain queries. In essence, when all of the fields you are requesting are present in the index key, then MongoDB does not have to go back to the main storage heap to retrieve anything. My benchmark results are further down in this write-up, but first I’d like to compare MongoDB’s Covered Indexes with Tokutek’s Clustered Fractal Tree Indexes.

MongoDB Covered Indexes Tokutek Clustered Fractal Tree Indexes
Query Efficiency Improved when all requested fields are part of index key Always improved, all non-keyed fields are stored in the index
Index Size Data is not compressed Generally 10x to 20x compression, user selects zlib, quicklz, or lzma. Note that non-clustered indexes are compressed as well.
Planning/Maintenance Index “covers” a fixed set of fields, adding a new field to an existing covered index requires a drop and recreate of the index. None, all fields in the document are always available in the index.

When putting my ideas together for the above table it struck me that covered indexes are really about a well defined schema, yet NoSQL is often thought of as “schema-less”. If you have a very large MongoDB collection and add a new field that you want covered by an existing index, the drop and recreate process will take a long time. On the other hand, a clustered Fractal Tree Index will automatically include this new field so there is no need to drop/recreate unless you need the field to be part of a .find() operation itself.

If you have some time to experiment this weekend, more MongoDB benchmarks/improvements to consider.

268x Query Performance Bump for MongoDB

Sunday, September 2nd, 2012

268x Query Performance Increase for MongoDB with Fractal Tree Indexes, SAY WHAT? by Tim Callaghan.

From the post:

Last week I wrote about our 10x insertion performance increase with MongoDB. We’ve continued our experimental integration of Fractal Tree® Indexes into MongoDB, adding support for clustered indexes. A clustered index stores all non-index fields as the “value” portion of the index, as opposed to a standard MongoDB index that stores a pointer to the document data. The benefit is that indexed lookups can immediately return any requested values instead of needing to do an additional lookup (and potential disk IOs) for the requested fields.

I’m trying to recover from learning about scalable subgraph matching, Efficient Subgraph Matching on Billion Node Graphs [Parallel Graph Processing], and now the nice folks at Tokutek post a 26,816% query performance increase for MongoDB.

They claim to not be MongoDB experts. I guess that’s right. The increase in performance would have been higher. 😉

Serious question: How long will it take this sort of performance increase to impact the modeling and design of information systems?

And in what way?

With high enough performance, can subject identity be modeled interactively?

MongoDB: Pumping Fractal Iron

Sunday, August 26th, 2012

10x Insertion Performance Increase for MongoDB with Fractal Tree Indexes by Tim Callaghan.

From the post:

The challenge of handling massive data processing workloads has spawned many new innovations and techniques in the database world, from indexing innovations like our Fractal Tree® technology to a myriad of “NoSQL” solutions (here is our Chief Scientist’s perspective). Among the most popular and widely adopted NoSQL solutions is MongoDB and we became curious if our Fractal Tree indexing could offer some advantage when combined with it. The answer seems to be a strong “yes”.

Earlier in the summer we kicked off a small side project and here’s what we did: we implemented a “version 2” IndexInterface as a Fractal Tree index and ran some benchmarks. Note that our integration only affects MongoDB’s secondary indexes; primary indexes continue to rely on MongoDB’s indexing code. All the changes we made to the MongoDB source are available here. Caveat: this was a quick and dirty project – the code is experimental grade so none of it is supported or went through any careful design analysis.

For our initial benchmark we measured the performance of a single threaded insertion workload. The inserted documents contained the following: URI (character), name (character), origin (character), creation date (timestamp), and expiration date (timestamp). We created a total of four secondary indexes: URI, name, origin, and creation date. The point of the benchmark is to insert enough documents such that the indexes are larger than main memory and show the insertion performance from an empty database to one that is largely dependent on disk IO. We ran the benchmark with journaling disabled, then again with journaling enabled.

Not for production use but the performance numbers should give you pause.

A long pause.

Fractal Tree Indexes and Mead – MySQL Meetup

Wednesday, January 11th, 2012

Fractal Tree Indexes and Mead – MySQL Meetup

From the post:

As a brief overview – most databases employ B-trees to achieve a good tradeoff between the ability to update data quickly and to search it quickly. It turns out that B-trees are far from the optimum in this tradeoff space. This led to the development at MIT, Rutgers and Stony Brook of Fractal Tree indexes. Fractal Tree indexes improve MySQL® scalability and query performance by allowing greater insertion rates, supporting rich indexing and offering efficient compression. They can also eliminate operational headaches such as dump/reloads, inflexible schemas and partitions.

The presentation provides an overview on how Fractal Tree indexes work, and then gets into some specific product features, benchmarks, and customer use cases that show where people have deployed Fractal Tree indexes via the TokuDB® storage engine.

Whether you are just browsing or seriously looking for better performance, I think you will like this presentation.

Performance of data stores is an issue for topic maps whether you store a final “merged” result or simply present “merged” results to users.

Cache-Oblivious Search Trees Project (Fractal Trees, TokuDB)

Thursday, November 3rd, 2011

Cache-Oblivious Search Trees Project (Fractal Trees, TokuDB)

I watched a very disappointing presentation on Fractal Trees (used by Tokutek in the TokuDB) and so went looking for better resources.

The project is described as:

We implemented a cache-oblivious dynamic search tree as an alternative to the ubiquitious B-tree. We used a binary tree with a “van Emde Boas” layout whose leaves point to intervals in a “packed memory structure”. The search tree supports efficient lookup, as well as efficient amortized insertion and deletion. Efficient implementation of a B-tree requires understanding the cache-line size and page size and is optimized for a specific memory hierarchy. In contrast, a cache-oblivious dynamic search tree contains no machine-dependent variables, performs well on any memory hierarchy, and requires minimal user-level memory management. For random insertion of data, the data structure performs better than the Berkeley DB and a good implementation of B-trees. Another advantage of my data structure is that the packed memory array maintains data in sorted order, allows sequential reads at high speeds, and data insertions and deletions with few data writes on average. In addition, the data structure is easy to implement because he employed memory mapping rather than making the data stored on disk be a single level store.

We also have designed cache-oblivious search trees for which the keys can be very long (imagine a key, such as a DNA sequence, that is larger than main memory), and trees into which data can be quickly inserted.

One essential difference is that the B-Tree supports random I/O and the Fractal Tree converts random I/O into sequential I/O, which operates at near disk bandwidth speeds.

At Tokutek, I would review the paper by Bradley C. Kuszmaul, How TokuDB Fractal Tree™ IndexesWork.

Impressive software for the right situation.

The background literature is interesting. Not sure if directly applicable to topic maps or not.