Archive for the ‘Concurrent Programming’ Category

Concurrency, Specification & Programming (CS&P 2015)

Thursday, October 29th, 2015

Concurrency, Specification & Programming, volume 1, Zbigniew Suraj, Ludwik Czaja (Eds.)

Concurrency, Specification & Programming, volume 2, Zbigniew Suraj, Ludwik Czaja (Eds.)

From the preface:

This two-volume book contains the papers selected for presentation at the Concurrency, Specification and Programming (CS&P) Workshop. It is taking place from 28th to 30th September 2015 in Rzeszow, the biggest city in southeastern Poland. CS&P provides an international forum for exchanging scientific, research, and technological achievements in concurrency, programming, artificial intelligence, and related fields. In particular, major areas selected for CS&P 2015 include mathematical models of concurrency, data mining and applications, fuzzy computing, logic and probability in theory of computing, rough and granular computing, unconventional computing models. In addition, three plenary keynote talks were delivered.

Not for the faint of heart but if you are interested in the future of computing, these two volumes should be on your reading list.

Flow: Actor-based Concurrency with C++ [FoundationDB]

Saturday, February 14th, 2015

Flow: Actor-based Concurrency with C++

From the post:

FoundationDB began with ambitious goals for both high performance per node and scalability. We knew that to achieve these goals we would face serious engineering challenges while developing the FoundationDB core. We’d need to implement efficient asynchronous communicating processes of the sort supported by Erlang
or the Async library in .NET, but we’d also need the raw speed and I/O efficiency of C++. Finally, we’d need to perform extensive simulation to engineer for reliability and fault tolerance on large clusters.

To meet these challenges, we developed several new tools, the first of which is Flow, a new programming language that brings actor-based concurrency to C++11. To add this capability, Flow introduces a number of new keywords and control-flow primitives for managing concurrency. Flow is implemented as a compiler which analyzes an asynchronous function (actor) and rewrites it as an object with many different sub-functions that use callbacks to avoid blocking (see streamlinejs for a similar concept using JavaScript). The Flow compiler’s output is normal C++11 code, which is then compiled to a binary using traditional tools. Flow also provides input to our simulation tool, Lithium, which conducts deterministic simulations of the entire system, including its physical interfaces and failure modes. In short, Flow allows efficient concurrency within C++ in a maintainable and extensible manner, achieving all three major engineering goals:

  • high performance (by compiling to native code),
  • actor-based concurrency (for high productivity development),
  • simulation support (for testing).

Flow Availability

Flow is not currently available outside of FoundationDB, but we’d like to open-source it in the future. If you’d like to stay in the loop with our progress subscribe below.

Are you going to be ready when Flow is released separate from FoundationDB?

The Little Book of Semaphores

Thursday, August 21st, 2014

The Little Book of Semaphores by Allen Downey.

From the webpage:

The Little Book of Semaphores is a free (in both senses of the word) textbook that introduces the principles of synchronization for concurrent programming.

In most computer science curricula, synchronization is a module in an Operating Systems class. OS textbooks present a standard set of problems with a standard set of solutions, but most students don’t get a good understanding of the material or the ability to solve similar problems.

The approach of this book is to identify patterns that are useful for a variety of synchronization problems and then show how they can be assembled into solutions. After each problem, the book offers a hint before showing a solution, giving students a better chance of discovering solutions on their own.

The book covers the classical problems, including “Readers-writers,” “Producer-consumer”, and “Dining Philosophers.” In addition, it collects a number of not-so-classical problems, some written by the author and some by other teachers and textbook writers. Readers are invited to create and submit new problems.

If you want a deep understanding of concurrency, this looks like a very good place to start!

Some of the more colorful problem names:

  • The dining savages problem
  • The Santa Claus problem
  • The unisex bathroom problem
  • The Senate Bus problem

There are problems (and patterns) for your discovery and enjoyment!

I first saw this in a tweet by Computer Science.

Understanding and Expressing Scalable Concurrency

Saturday, July 5th, 2014

Understanding and Expressing Scalable Concurrency by Aaron Turon.


The Holy Grail of parallel programming is to provide good speedup while hiding or avoiding the pitfalls of concurrency. But some level in the tower of abstraction must face facts: parallel processors execute code concurrently, and the interplay between concurrent code, synchronization, and the memory subsystem is a major determiner of performance. Effšective parallel programming must ultimately be supported by scalable concurrent algorithms—algorithms that tolerate (or even embrace) concurrency for the sake of scaling with available parallelism. This dissertation makes several contributions to the understanding and expression of such algorithms:

  • It shows how to understand scalable algorithms in terms of local protocols governing each part of their hidden state. These protocols are visual artifacts that can be used to informally explain an algorithm at the whiteboard. But they also play a formal role in a new logic for verifying concurrent algorithms, enabling correctness proofs that are local in space, time, and thread execution. Correctness is stated in terms of refinement: clients of an algorithm can reason as if they were using the much simpler specification code it refines.
  • It shows how to express synchronization in a declarative but scalable way, based on a new library providing join patterns. By declarative, we mean that the programmer needs only to write down the constraints of a synchronization problem, and the library will automatically derive a correct solution.By scalable, we mean that the derived solutions deliver robust performance with increasing processor count and problem complexity. The library’s performance on common synchronization problems is competitive with specialized algorithms from the literature.
  • It shows how to express scalable algorithms through reagents, a new monadic abstraction. With reagents, concurrent algorithms no longer need to be constructed from “wholecloth,” i.e., by using system-level primitives directly. Instead, they are built using a mixture of shared-state and message-passing combinators. Concurrency experts benefit, because they can write libraries at a higher level, with more reuse, without sacrificing scalability. Their clients benefit, because composition empowers them to extend and tailor a library without knowing the details of its underlying algorithms.

Not for the faint of heart! 😉

But if you are interested in algorithms for when processing is always parallel by default, best dig in.

I like the author’s imagery of “Go Fish” when he says:

A scalable hashtable is useful not just for concurrent systems; it can also be a boon for explicit parallel programming. A simple but vivid example is the problem of duplicate removal: given a vector of items, return the items in any order, but without any duplicates. Since the input is unstructured, any way of dividing it amongst parallel threads appears to require global coordination to discover duplicate items. The key to avoiding a multiprocessor game of “Go Fish” is to focus on producing the output rather than dividing the input. If threads share a scalable hashtable that allows parallel insertion of distinct elements, they can construct the correct output with (on average) very little coordination, by simply each inserting a segment of the input into the table, one element at a time.

Now that I think about it, topic map processing does a lot of duplicate removal.

Topic maps in a parallel processing environment anyone?

I first saw this in a tweet by Alex Clemmer.

Future Values of Merges

Thursday, April 17th, 2014

Multilisp: A Language for Concurrent Symbolic Computation by Robert H. Halstead. (ACM Transactions on Programming Languages and Systems, Vol. 7, No. 4, October 1985, Pages 501-538.)


Multilisp is a version of the Lisp dialect Scheme extended with constructs for parallel execution. Like Scheme, Multilisp is oriented toward symbolic computation. Unlike some parallel programming languages, Multilisp incorporates constructs for causing side effects and for explicitly introducing parallelism. The potential complexity of dealing with side effects in a parallel context is mitigated by the nature of the parallelism constructs and by support for abstract data types: a recommended Multilisp programming style is presented which, if followed, should lead to highly parallel, easily understandable programs. Multilisp is being implemented on the 32-processor Concert multiprocessor; however, it is ulti-mately intended for use on larger multiprocessors. The current implementation, called Concert Multilisp, is complete enough to run the Multilisp compiler itself and has been run on Concert prototypes including up to eight processors. Concert Multilisp uses novel techniques for task scheduling and garbage collection. The task scheduler helps control excessive resource utilization by means of an unfair scheduling policy; the garbage collector uses a multiprocessor algorithm based on the incremental garbage collector of Baker.

Of particular interest:

Multilisp’s principal construct for both creating tasks and synchronizing among them is the future. The construct ( future X ) immediately returns a future for the value of the expression X and concurrently begins evaluating X. When the evaluation of X yields a value, that value replaces the future. The future is said to be initially undetermined; it becomes determined when its value has been computed. An operation (such as addition) that needs to know the value of an undetermined future will be suspended until the future becomes determined, but many operations, such as assignment and parameter passing, do not need to know anything about the values of their operands and may be performed quite comfortably on undetermined futures. The use of futures often exposes surprisingly large amounts of parallelism in a program, as illustrated by a Quicksort program given in Figure 1.

The use of a future value for merge operations, which could avoid re-processing a topic map for each cycle of merging, sounds promising.

Deferral of the results isn’t just an old Lisp idea as you will find in: Counting complex disordered states by efficient pattern matching: chromatic polynomials and Potts partition functions by Marc Timme, Frank van Bussel, Denny Fliegner, and Sebastian Stolzenberg.


Counting problems, determining the number of possible states of a large system under certain constraints, play an important role in many areas of science. They naturally arise for complex disordered systems in physics and chemistry, in mathematical graph theory, and in computer science. Counting problems, however, are among the hardest problems to access computationally. Here, we suggest a novel method to access a benchmark counting problem, finding chromatic polynomials of graphs. We develop a vertex-oriented symbolic pattern matching algorithm that exploits the equivalence between the chromatic polynomial and the zero-temperature partition function of the Potts antiferromagnet on the same graph. Implementing this bottom-up algorithm using appropriate computer algebra, the new method outperforms standard top-down methods by several orders of magnitude, already for moderately sized graphs. As a first application, we compute chromatic polynomials of samples of the simple cubic lattice, for the first time computationally accessing three-dimensional lattices of physical relevance. The method offers straightforward generalizations to several other counting problems.

In lay person’s terms the work by Timme and company visits each node in a graph and records an expression that includes unkowns (futures?), that is the values at other nodes in the graph. Using pattern matching techniques, the algorithm then solves all of the unknowns and replaces them with appropriate values. How effective is this?

“The existing algorithm copies the whole network for each stage of the calculation and only changes one aspect of it each time,” explains Frank van Bussel of the Max Planck Institute for Dynamics and Self-Organization (MPIDS). Increasing the number of nodes dramatically increases the calculation time. For a square lattice the size of a chess board, this is estimated to be many billions of years. The new algorithm developed by the Göttingen-based scientists is significantly faster. “Our calculation for the chess board lattice only takes seven seconds,” explains Denny Fliegner from MPIDS. (A New Kind of Counting)

Hmmm, “many billions of years” versus “seven seconds.”

For further details on the pattern matching work see the project page at: Complex Disordered Systems: Statistical Physics and Symbolic Computation

Deferral of results looks like a fruitful area for research for topic maps in general and parallel processing of topic maps in particular.

I first saw the Multilisp paper in a tweet by Mom of Finland.

Monitoring Real-Time Bidding at AdRoll

Friday, March 7th, 2014

Monitoring Real-Time Bidding at Adroll by Brian Troutwine.

From the description:

This is the talk I gave at Erlang Factory SF Bay Area 2014. In it I discussed the instrumentation by default approach taken in the AdRoll real-time bidding team, discuss the technical details of the libraries we use and lessons learned to adapt your organization to deal with the onslaught of data from instrumentation.

The problem domain:

  • Low latency ( < 100ms per transaction )
  • Firm real-time system
  • Highly concurrent ( > 30 billion transactions per day )
  • Global, 24/7 operation

(emphasis in original)

They are not doing semantic processing subject to those requirements. 😉

But, that’s ok. If needed, you can assign semantics to the data and its containers separately.

A very impressive use of Erlang.

Erlang – Concurrent Language for Concurrent World

Sunday, October 27th, 2013

Erlang – Concurrent Language for Concurrent World by Zvi Avraham.

If you need to forward a “why Erlang” to a programmer, this set of slides should be near the top of your list.

It includes this quote from Joe Armstrong:

The world is concurrent… I could not drive the car, if I did not understand concurrency…”

Which makes me wonder: Do all the drivers I have seen in Atlanta understand concurrency?

That would really surprise me. 😉

The point should be that systems should be concurrent by their very nature, like the world around us.

Users should object when systems exhibit sequential behavior.

Systems that run forever self-heal and scale (Scaling Topic Maps?)

Monday, August 19th, 2013

Systems that run forever self-heal and scale by Joe Armstrong.

From the description:

Joe Armstrong outlines the architectural principles needed for building scalable fault-tolerant systems built from small isolated parallel components which communicate though well-defined protocols.

Great visuals on the difference between imperative programming and concurrent programming.

About half of the data transmission from smart phones uses Erlang.

A very high level view of the architectural principles for building scalable fault-tolerant systems.

All of Joe’s talk is important but for today I want to focus on his first principle for scalable fault-tolerant systems:


Joe enumerates the benefits of isolation of processes as follows:

Isolation enables:

  • Fault-tolerant
  • Scalability
  • Reliability
  • Testability
  • Comprehensibility
  • Code Upgrade

Are you aware of any topic map engine that uses multiple, isolated processes for merging topics?

Not threads, but processes.

Threads being managed by an operating system scheduler are not really parallel processes, whatever its appearance to the casual user. Erlang processes, on the other hand, do run in parallel and when more processes are required, simply add more hardware.

We could take a clue from Scalable SPARQL Querying of Large RDF Graphs Jiewen Huang, Daniel J. Abadi and, Kun Ren, partitioning parts of a topic map into different data stores and querying each store for a part of any query.

But that’s adapting data to a sequential process, not a bad solution but one that you will have to repeat as data or queries change and evolve. Pseudo-parallelism.

One of a concurrent process approach on immutable topics, associations, occurrences (see Working with Immutable Data by Saša Jurić) would be that different processes could be applying different merging tests to the same set of topics, associations, occurrences.

Or the speed of your answer might depend on whether you have sent a query over a “free” interface, which is supported by a few processes or over a subscription interface, which has dozens if not hundreds of processes at your disposal.

The speed and comprehensiveness of a topic map answer to any query might be a economic model for a public topic map service.

If all I want to know about Anthony Weiner was: “Vote NO!” that could be free.

If you wanted pics, vics and all, that could be a different price.

GHCJS introduction – Concurrent Haskell in the browser

Thursday, June 27th, 2013

GHCJS introduction – Concurrent Haskell in the browser by Luite Stegeman.

From the post:

After many months of hard work, we are finally ready to show you the new version of GHCJS. Our goal is to provide a full-featured Haskell runtime in the browser, with features like threading, exceptions, weak references and STM, allowing you to run existing libraries with minimal modification. In addition we have some JavaScript-specific features to make communication with the JS world more convenient. GHCJS uses its own package database and comes with Cabal and Template Haskell support.

The new version (gen2) is almost a ground-up rewrite of the older (trampoline/plain) versions. We built on our experience with the trampoline code generator, by Victor Nazarov and Hamish Mackenzie. The most important changes are that we now use regular JavaScript objects to store Haskell heap objects (instead of JS closures), and that we have switched to explicit stacks in a JavaScript array. This helps reduce the amount of memory allocation considerably, making the resulting code run much faster. GHCJS now uses Gershom Bazerman’s JMacro library for generating JavaScript and for the Foreign Function Interface.

This post is a practical introduction to get started with GHCJS. Even though the compiler mostly works now, it’s far from finished. We’d love to hear some feedback from users and get some help preparing libraries for GHCJS before we make our first official release (planned to coincide with the GHC 7.8 release). I have listed some fun projects that you can help us with at the end of this post. Join #ghcjs on freenode for discussion with the developers and other users.

Functional programming in your browser.

Parallel and Concurrent Programming in Haskell

Saturday, March 30th, 2013

Parallel and Concurrent Programming in Haskell by Simon Marlow.

From the introduction:

While most programming languages nowadays provide some form of concurrent or parallel programming facilities, very few provide as wide a range as Haskell. Haskell prides itself on having the right tool for the job, for as many jobs as possible. If a job is discovered for which there isn’t already a good tool, Haskell’s typical response is to invent a new tool. Haskell’s abstraction facilities provide a fertile ground on which to experiment with different programming idioms, and that is exactly what has happened in the space of concurrent and parallel programming.

Is this a good or a bad thing? You certainly can get away with just one way of writing concurrent programs: threads and locks are in principle all you need. But as the programming community has begun to realise over the last few years, threads and locks are not the right tool for most jobs. Programming with them requires a high degree of expertise even for simple tasks, and leads to programs that have hard-to-diagnose faults.

So in Haskell we embrace the idea that different problems require different tools, and we provide the programmer with a rich selection to choose from. The inevitable downside is that there is a lot to learn, and that is what this book is all about.

In this book I will discuss how to write parallel and concurrent programs in Haskell, ranging from the simple uses of parallelism to speed up computation-heavy programs, to the use of lightweight threads for writing high-speed concurrent network servers. Along the way we’ll see how to use Haskell to write programs that run on the powerful processor in a modern graphics card (GPU), and to write programs that can run on multiple machines in a network (distributed programming).

In O’Reilly’s Open Feedback Publishing System.

If you really want to learn something, write a book about it, edit a book about it or teach a class about it.

Here’s your chance for #2.

Read carefully!

I first saw this in Christophe Lalanne’s A bag of tweets / March 2013.

Concurrency Improvements in TokuDB v6.6 (Part 1)

Friday, February 1st, 2013

Concurrency Improvements in TokuDB v6.6 (Part 1)

From the post:

With TokuDB v6.6 out now, I’m excited to present one of my favorite enhancements: concurrency within a single index. Previously, while there could be many SQL transactions in-flight at any given moment, operations inside a single index were fairly serialized. We’ve been working on concurrency for a few versions, and things have been getting a lot better over time. Today I’ll talk about what to expect from v6.6. Next time, we’ll see why.

Impressive numbers as always!

Should get you interested in learning how this was done as an engineering matter. (That’s in part 2.)

Concurrent Programming for Scalable Web Architectures

Wednesday, June 6th, 2012

Concurrent Programming for Scalable Web Architectures by Benjamin Erb.


Web architectures are an important asset for various large-scale web applications, such as social networks or e-commerce sites. Being able to handle huge numbers of users concurrently is essential, thus scalability is one of the most important features of these architectures. Multi-core processors, highly distributed backend architectures and new web technologies force us to reconsider approaches for concurrent programming in order to implement web applications and fulfil scalability demands. While focusing on different stages of scalable web architectures, we provide a survey of competing concurrency approaches and point to their adequate usages.

High Scalability has a good list of topics and the table of contents.

Or you can jump to the thesis homepage.

Just in case you are thinking about taking your application to “web scale.” 😉