Archive for the ‘Graph Partitioning’ Category

Parallel Graph Partitioning for Complex Networks

Monday, April 21st, 2014

Parallel Graph Partitioning for Complex Networks by Henning Meyerhenke, Peter Sanders, and, Christian Schulz.

Abstract:

Processing large complex networks like social networks or web graphs has recently attracted considerable interest. In order to do this in parallel, we need to partition them into pieces of about equal size. Unfortunately, previous parallel graph partitioners originally developed for more regular mesh-like networks do not work well for these networks. This paper addresses this problem by parallelizing and adapting the label propagation technique originally developed for graph clustering. By introducing size constraints, label propagation becomes applicable for both the coarsening and the refinement phase of multilevel graph partitioning. We obtain very high quality by applying a highly parallel evolutionary algorithm to the coarsened graph. The resulting system is both more scalable and achieves higher quality than state-of-the-art systems like ParMetis or PT-Scotch. For large complex networks the performance differences are very big. For example, our algorithm can partition a web graph with 3.3 billion edges in less than sixteen seconds using 512 cores of a high performance cluster while producing a high quality partition — none of the competing systems can handle this graph on our system.

Clustering in this article is defined by a node’s “neighborhood,” I am curious if defining a “neighborhood” based on multi-part (hierarchical?) identifiers might enable parallel processing of merging conditions?

While looking for resources on graph contraction, I encountered a series of lectures by Kanat Tangwongsan from: Parallel and Sequential Data Structures and Algorithms, 15-210 (Spring 2012) (link to the course schedule with numerous resources):

Lecture 17 — Graph Contraction I: Tree Contraction

Lecture 18 — Graph Contraction II: Connectivity and MSTs

Lecture 19 — Graph Contraction III: Parallel MST and MIS

Enjoy!

KaHIP – Karlsruhe High Quality Partitioning (Graphs)

Saturday, November 30th, 2013

KaHIP – Karlsruhe High Quality Partitioning

From the webpage:

The graph partitioning problem asks for a division of a graph’s node set into k equally sized blocks such that the number of edges that run between the blocks is minimized. An example graph that is partitioned into four blocks:

airfoil

KaHIP – Karlsruhe High Quality Partitioning – is a family of graph partitioning programs. It includes KaFFPa (Karlsruhe Fast Flow Partitioner) in its variants Strong, Eco and Fast, KaFFPaE (KaFFPaEvolutionary) which is a parallel evolutionary algorithm that uses KaFFPa to provide combine and mutation operations, as well as KaBaPE which extends the evolutionary algorithm. Moreover, specialized techniques are included to partition road networks (Buffoon) and to output a vertex separator from a given partition.

Licence

The program is licenced under GPL 3.0. Please let us know if you need a commercial licence.

If you publish results using our algorithms, please acknowledge our work by quoting the following paper:

@inproceedings{sandersschulz2013,
AUTHOR = {Sanders, Peter and Schulz, Christian},
TITLE = {{Think Locally, Act Globally: Highly Balanced Graph Partitioning}},
BOOKTITLE = {Proceedings of the 12th International Symposium on Experimental Algorithms (SEA’13)},
SERIES = {LNCS},
PUBLISHER = {Springer},
YEAR = {2013},
VOLUME = {7933}
PAGES = {164–175}
}

The algorithms that are included for download are mainly based on the following publications:

  • Peter Sanders and Christian Schulz. Engineering Multilevel Graph Partitioning Algorithms. In Proceedings of the 19th European Symposium on Algorithms (ESA’11), volume 6942 of LNCS, pages 469–480. Springer, 2011. Download PDF.
  • Peter Sanders and Christian Schulz. Distributed Evolutionary Graph Partitioning. In Proceedings of the 12th Workshop on Algorithm Engineering and Experimentation (ALENEX’12), pages 16–19, 2012. Download PDF.
  • Peter Sanders and Christian Schulz. High Quality Graph Partitioning. In Proceedings of the 10th DIMACS Implementation Challenge Workshop: Graph Partitioning and Graph Clustering, pages 1–17, AMS, 2013. Download PDF.
  • Peter Sanders and Christian Schulz. Think Locally, Act Globally: Highly Balanced Graph Partitioning. In Proceedings of the 12th International Symposium on Experimental Algorithms (SEA’13), volume 7933 of LNCS, pages 164–175, 2013. Download PDF.
  • Christian Schulz. High Quality Graph Partitioning. PhD thesis. Karlsruhe Institute of Technology, 2013.
    ISBN 978-3844264623, epubli GmbH. Download PDF.

Download

News of interest to the graph side of the house!

And topic maps for the same reason. That is obtaining the smallest number of associations that run across partitions.

Although, due to merging, topic maps present additional complications. It isn’t possible to predict when additions to one partition may result in merges across one or more partitions.

I’m not sure how that would be anticipated except by restrictions on merging rules. Suggestions?

I first saw this at: Enhancing Efficiency of Complex Computations.

GraphLab: A Distributed Abstraction…

Monday, March 11th, 2013

GraphLab: A Distributed Abstraction for Machine Learning in the Cloud by Carlos Guestrin. (video)

Take away line: “How does a vertex think?”

Deeply impressive presentation and performance numbers!

Resources:

GraphLab 2.1: http://graphlab.org

GraphChi 0.1: http://graphchi.org

Slides from the talk.

This needs to be very high on your graph reading/following list.

I first saw this at: SFBayACM talk: GraphLab framework for Machine Learning in the Cloud.

Graph Partitioning and Expanders (April 2013)

Saturday, March 9th, 2013

Graph Partitioning and Expanders by Professor Luca Trevisan.

From the description:

In this research-oriented graduate course, we will study algorithms for graph partitioning and clustering, constructions of expander graphs, and analysis of random walks. These are three topics that build on the same mathematical background and that have several important connections: for example it is possible to find graph clusters via random walks, and it is possible to use the linear programming approach to graph partitioning as a way to study random walks.

We will study spectral graph theory, which explains how certain combinatorial properties of graphs are related to the eigenvalues and eigenvectors of the adjacency matrix, and we will use it describe and analyze spectral algorithms for graph partitioning and clustering. Spectral graph theory will recur as an important tool in the rest of the course. We we will also discuss other approaches to graph partitioning via linear programming and semidefinite programming. Then we will study constructions of expander graphs, which are graphs with very strong pseudorandomness properties, which are useful in many applications, including in cryptography, in complexity theory, in algorithms and data structures, and in coding theory. Finally, we will study the mixing time of random walks, a problem that comes up in several applications, including the analysis of the convergence time of certain randomized algorithms, such as the Metropolis algorithm.

Workload

about 8 hours per week

Prerequisites

linear algebra, discrete probability, and algorithms

The Instructor

Luca Trevisan is a professor of computer science at Stanford University. Before joining Stanford in 2010, Luca taught at Columbia University and at the University of California, Berkeley.

Luca’s research is in theoretical computer science, and he has worked on average-case complexity theory, pseudorandomness and derandomization, hardness of approximation, probabilistically checkable proofs, and approximation algorithms. In the past three years he has been working on spectral graph theory and its applications to graph algorithmns.

Luca received the STOC’97 Danny Lewin award, the 2000 Oberwolfach Prize, and the 2000 Sloan Fellowship. He was an invited speaker at the 2006 International Congress of Mathematicians in Madrid.

Not for the faint of heart!

But on the other hand, if you want to be on the cutting edge of graph development….

Graph partitioning in MapReduce with Cascading

Thursday, February 16th, 2012

Graph partitioning in MapReduce with Cascading in two parts by Friso van Vollenhoven.

Graph partitioning in MapReduce with Cascading (Part 1).

From the post:

I have recently had the joy of doing MapReduce based graph partitioning. Here’s a post about how I did that. I decided to use Cascading for writing my MR jobs, as it is a lot less verbose than raw Java based MR. The graph algorithm consists of one step to prepare the input data and then a iterative part, that runs until convergence. The program uses a Hadoop counter to check for convergence and will stop iterating once there. All code is available. Also, the explanation has colorful images of graphs. (And everything is written very informally and there is no math.)

Graph partitioning part 2: connected graphs.

From the post:

In a previous post, we talked about finding the partitions in a disconnected graph using Cascading. In reality, most graphs are actually fully connected, so only being able to partition already disconnected graphs is not very helpful. In this post, we’ll take a look at partitioning a connected graph based on some criterium for creating a partition boundary.

Very accessible explanations complete with source code (github).

What puzzles me about the efforts to develop an algorithm to automatically partition a graph database is that there is no corresponding effort to develop an algorithm to automatically partition relational databases. Yet we know that relational databases can be represented as graphs. So what’s the difference?

Conceding that graphs such as Facebook, the WWW, etc., have grown without planning and so aren’t subject to the same partitioning considerations as relational databases. But isn’t there a class of graphs that are closer to relational databases than Facebook?

Consider that diverse research facilities for a drug company could use graph databases for research purposes but that doesn’t mean that any user can create edges between nodes at random. Any more than a user of a sharded database can create arbitrary joins.

I deeply enjoy graph posts such as these by Friso van Vollenhoven but the “cool” aspects of using MapReduce should not keep us from seeing heuristics we can use to enhance the performance of graph databases.

Graph-Database.org

Wednesday, November 2nd, 2011

Graph-Database.org

Interesting site with a number of presentations/resources on graphs. Worth a visit.

Partitioning Graph Databases

Wednesday, June 15th, 2011

Partitioning Graph Databases by ALEX AVERBUCH & MARTIN NEUMANN.

Focusing on Neo4j, reports that compared to random partitioning, use of algorithms herein result in a reduction of inter-partition traffic by 40 to 90%, depending on the dataset.

Abstract:

The amount of globally stored, electronic data is growing at an increasing rate. This growth is both in size and connectivity, where connectivity refers to the increasing presence of, and interest in, relationships between data [12]. An example of such data is the social network graph created and stored by Twitter [2].

Due to this growth, demand is increasing for technologies that can process such data. Currently relational databases are the predominant data storage technology, but they are poorly suited to processing connected data as they are optimized for index-intensive operations. Conversely, the storage engines of graph databases are optimized for graph computation as they store records adjacent to one another, linked by direct references. This enables retrieval of adjacent elements in constant time, regardless of graph size, and allows for relationships to be followed without performing costly index lookups. However, as data volume increases these databases outgrow the resources available on a single computer, and partitioning the data becomes necessary. At present, few graph databases are capable of doing this [6].

In this work we evaluate the viability of using graph partitioning algorithms as a means of partitioning graph databases, with focus on the Neo4j graph database [4]. For this purpose, a prototype partitioned database was developed. Then, three partitioning algorithms were explored and one implemented. During evaluation, three
graph datasets were used: two from production applications, and one synthetically generated. These were partitioned in various ways and the impact on database
performance was measured. To gauge this impact, we de fined one synthetic access pattern per dataset and executed each one on the partitioned datasets. Evaluation took place in a simulation environment, which ensured repeatability and made it possible to measure certain metrics, such as network traffic and load balance.

Simulation results show that, compared to random partitioning, use of a graph partitioning algorithm reduced inter-partition traffic by 40{90 %, depending on
dataset. Executing the algorithm intermittently during database usage was shown to maintain partition quality, while requiring only 1% the computation time of
initially partitioning the datasets. Finally, a strong correlation was found between theoretic graph partitioning quality metrics and the generated inter-partition traffic
under non-uniform access patterns. Our results suggest that use of such algorithms to partition graph databases can result in signlfi cant performance benefi ts, and
warrants further investigation.

CS359G: Graph Partitioning and Expanders

Monday, January 17th, 2011

CS359G: Graph Partitioning and Expanders

Luca Trevisan‘s course at Standford on Graph Partitioning.

I arrived at this course page after finding the blog for the class: Cs359g.

Graph partitioning is relevant to clustering on the basis of similarity, or as stated in the overview blog post:

Finding balanced separators and sparse cuts arises in clustering problems, in which the presence of an edge denotes a relation of similarity, and one wants to partition vertices into few clusters so that, for the most part, vertices in the same cluster are similar and vertices in different clusters are not. For example, sparse cut approximation algorithms are used for image segmentation, by reducing the image segmentation problem to a graph clustering problem in which the vertices are the pixels of the image and the (weights of the) edges represent similarities between nearby pixels.

Depending upon the properties you are using as the basis for similarity (identity?), graph partitioning is likely to be relevant to your topic map application.