Abstract:
The GraphLab abstraction is the product of several years of research in designing and implementing systems for statistical inference in probabilistic graphical models. Early in our work [12], we discovered that the high-level parallel abstractions popular in the ML community such as MapReduce [2, 13] and parallel BLAS [14] libraries are unable to express statistical inference algorithms efficiently. Our work revealed that an efficient algorithm for graphical model inference should explicitly address the sparse dependencies between random variables and adapt to the input data and model parameters.
Guided by this intuition we spent over a year designing and implementing various machine learning algorithms on top of low-level threading primitives and distributed communication frameworks such as OpenMP [15], CILK++ [16] and MPI [1]. Through this process, we discovered the following set of core algorithmic patterns that are common to a wide range of machine learning techniques. Following, we detail our findings and motivate why a new framework is needed (see Table 1).
See also our prior post on GraphLab.