Cache-Oblivious Search Trees Project (Fractal Trees, TokuDB)
I watched a very disappointing presentation on Fractal Trees (used by Tokutek in the TokuDB) and so went looking for better resources.
The project is described as:
We implemented a cache-oblivious dynamic search tree as an alternative to the ubiquitious B-tree. We used a binary tree with a “van Emde Boas” layout whose leaves point to intervals in a “packed memory structure”. The search tree supports efficient lookup, as well as efficient amortized insertion and deletion. Efficient implementation of a B-tree requires understanding the cache-line size and page size and is optimized for a specific memory hierarchy. In contrast, a cache-oblivious dynamic search tree contains no machine-dependent variables, performs well on any memory hierarchy, and requires minimal user-level memory management. For random insertion of data, the data structure performs better than the Berkeley DB and a good implementation of B-trees. Another advantage of my data structure is that the packed memory array maintains data in sorted order, allows sequential reads at high speeds, and data insertions and deletions with few data writes on average. In addition, the data structure is easy to implement because he employed memory mapping rather than making the data stored on disk be a single level store.
We also have designed cache-oblivious search trees for which the keys can be very long (imagine a key, such as a DNA sequence, that is larger than main memory), and trees into which data can be quickly inserted.
One essential difference is that the B-Tree supports random I/O and the Fractal Tree converts random I/O into sequential I/O, which operates at near disk bandwidth speeds.
At Tokutek, I would review the paper by Bradley C. Kuszmaul, How TokuDB Fractal Tree™ IndexesWork.
Impressive software for the right situation.
The background literature is interesting. Not sure if directly applicable to topic maps or not.