From the post:
Because the dataflow operators in a graph work in parallel, the model allows overlapping I/O operations with computation. This is a “whole application” approach to parallelization as opposed to many thread-oriented performance frameworks that focus on hot sections of code such as for loops. This addresses a key problem in processing “big data” for today’s many-core processors: feeding data fast enough to the processors.
While dataflow does this, it scales down easily as well. This distinguishes it from technologies, such as Hadoop and to a lesser extent, Map Reduce, which don’t scale downward well due to their innate complexity.
We’ve discussed how a dataflow architecture exploits multicore. These same principles can be applied to multi-node clusters by extending dataflow queues over networks with a dataflow graph executed on multiple systems in parallel. The compositional model of building dataflow graphs allows for replication of pieces of the graph across multiple nodes. Scaling out extends the reach of dataflow to solve large data problems.
Read the first comment. By J.P. Morrison, author of Flow Based Programming, the inspiration for Pipes. (Shamelessly repeated from a post by Marko Rodriguez on the gremlin-users list, Achim first noticed the article.)
Be aware that Amazon lists the Kindle edition for \$29.00 and a hardback edition for \$69.00. Sadly one reader reports the book has no index?