Adrian’s review of Optimizing Hash-Array Mapped Tries for Fast and Lean Immutable JVM Collections by Steinforder & Vinju, 2015, starts this way:
You’d think that the collection classes in modern JVM-based languages would be highly efficient at this point in time – and indeed they are. But the wonderful thing is that there always seems to be room for improvement. Today’s paper examines immutable collections on the JVM – in particular, in Scala and Clojure – and highlights a new CHAMPion data structure that offers 1.3-6.7x faster iteration, and 3-25.4x faster equality checking.
CHAMP stands for Compressed Hash-Array Mapped Prefix-tree.
The use of immutable collections is on the rise…
Immutable collections are a specific area most relevant to functional/object-oriented programming such as practiced by Scala and Clojure programmers. With the advance of functional language constructs in Java 8 and functional APIs such as the stream processing API, immutable collections become more relevant to Java as well. Immutability for collections has a number of benefits: it implies referential transparency without giving up on sharing data; it satisfies safety requirements for having co-variant sub-types; it allows to safely share data in presence of concurrency.
Both Scala and Clojure use a Hash-Array Mapped Trie (HAMT) data structure for immutable collections. The HAMT data structure was originally developed by Bagwell in C/C++. It becomes less efficient when ported to the JVM due to the lack of control over memory layout and the extra indirection caused by arrays also being objects. This paper is all about the quest for an efficient JVM-based derivative of HAMTs.
Fine-tuning data structures for cache locality usually improves their runtime performance. However, HAMTs inherently feature many memory indirections due to their tree-based nature, notably when compared to array-based data structures such as hashtables. Therefore HAMTs presents an optimization challenge on the JVM. Our goal is to optimize HAMT-based data structures such that they become a strong competitor of their optimized array-based counterparts in terms of speed and memory footprints.
Adrian had me at: “a new CHAMPion data structure that offers 1.3-6.7x faster iteration, and 3-25.4x faster equality checking.”
If you want experience with the proposed data structures, the authors have implemented them in the Rascal Metaprogramming Language.
I first saw this in a tweet by Atabey Kaygun