Understanding and Expressing Scalable Concurrency by Aaron Turon.
Abstract
The Holy Grail of parallel programming is to provide good speedup while hiding or avoiding the pitfalls of concurrency. But some level in the tower of abstraction must face facts: parallel processors execute code concurrently, and the interplay between concurrent code, synchronization, and the memory subsystem is a major determiner of performance. Effective parallel programming must ultimately be supported by scalable concurrent algorithms—algorithms that tolerate (or even embrace) concurrency for the sake of scaling with available parallelism. This dissertation makes several contributions to the understanding and expression of such algorithms:
- It shows how to understand scalable algorithms in terms of local protocols governing each part of their hidden state. These protocols are visual artifacts that can be used to informally explain an algorithm at the whiteboard. But they also play a formal role in a new logic for verifying concurrent algorithms, enabling correctness proofs that are local in space, time, and thread execution. Correctness is stated in terms of refinement: clients of an algorithm can reason as if they were using the much simpler specification code it refines.
- It shows how to express synchronization in a declarative but scalable way, based on a new library providing join patterns. By declarative, we mean that the programmer needs only to write down the constraints of a synchronization problem, and the library will automatically derive a correct solution.By scalable, we mean that the derived solutions deliver robust performance with increasing processor count and problem complexity. The library’s performance on common synchronization problems is competitive with specialized algorithms from the literature.
- It shows how to express scalable algorithms through reagents, a new monadic abstraction. With reagents, concurrent algorithms no longer need to be constructed from “wholecloth,” i.e., by using system-level primitives directly. Instead, they are built using a mixture of shared-state and message-passing combinators. Concurrency experts benefit, because they can write libraries at a higher level, with more reuse, without sacrificing scalability. Their clients benefit, because composition empowers them to extend and tailor a library without knowing the details of its underlying algorithms.
Not for the faint of heart! 😉
But if you are interested in algorithms for when processing is always parallel by default, best dig in.
I like the author’s imagery of “Go Fish” when he says:
A scalable hashtable is useful not just for concurrent systems; it can also be a boon for explicit parallel programming. A simple but vivid example is the problem of duplicate removal: given a vector of items, return the items in any order, but without any duplicates. Since the input is unstructured, any way of dividing it amongst parallel threads appears to require global coordination to discover duplicate items. The key to avoiding a multiprocessor game of “Go Fish” is to focus on producing the output rather than dividing the input. If threads share a scalable hashtable that allows parallel insertion of distinct elements, they can construct the correct output with (on average) very little coordination, by simply each inserting a segment of the input into the table, one element at a time.
Now that I think about it, topic map processing does a lot of duplicate removal.
Topic maps in a parallel processing environment anyone?
I first saw this in a tweet by Alex Clemmer.