Archive for the ‘Signal/Collect’ Category

TripleRush: A Fast and Scalable Triple Store

Monday, October 21st, 2013

TripleRush: A Fast and Scalable Triple Store by Philip Stutz, Mihaela Verman, Lorenz Fischer, and Abraham Bernstein.

Abstract:

TripleRush is a parallel in-memory triple store designed to address the need for efficient graph stores that quickly answer queries over large-scale graph data. To that end it leverages a novel, graph-based architecture.

Specifi cally, TripleRush is built on our parallel and distributed graph processing framework Signal/Collect. The index structure is represented as a graph where each index vertex corresponds to a triple pattern. Partially matched copies of a query are routed in parallel along di fferent paths of this index structure.

We show experimentally that TripleRush takes less than a third of the time to answer queries compared to the fastest of three state-of-the-art triple stores, when measuring time as the geometric mean of all queries for two benchmarks. On individual queries, TripleRush is up to three orders of magnitude faster than other triple stores.

If the abstract hasn’t already gotten your interest, consider the following:

The index graph we just described is di fferent from traditional index structures, because it is designed for the efficient parallel routing of messages to triples that correspond to a given triple pattern. All vertices that form the index structure are active parallel processing elements that only interact via message passing.

That is the beginning to section “3.2 Query Processing.” It has a worked example that will repay a close reading.

The processing model outlined here is triple specific, but I don’t see any reason why the principles would not work for other graph structures.

This is going to the top of my reading list.

I first saw this in a tweet by Stefano Bertolo.

International BASP Frontiers Workshop 2013

Monday, July 16th, 2012

International BASP Frontiers Workshop 2013

January 27th – February 1st, 2013 Villars-sur-Ollon (Switzerland)

The international biomedical and astronomical signal processing (BASP) Frontiers workshop was created to promote synergies between selected topics in astronomy and biomedical sciences, around common challenges for signal processing.

The 2013 workshop will concentrate on the themes of sparse signal sampling and reconstruction, for radio interferometry and MRI, but also open its floor to many other interesting hot topics in theoretical, astrophysical, and biomedical signal processing.

Signal processing is one form of “big data” and is rich in subjects, both in the literature and in the data.

Proceedings from the first BASP workshop are available. Be advised it is a 354 MB zip file. If you aren’t on an airport wifi, you can find those proceedings here.

Google Pregel vs Signal Collect for distributed Graph Processing – pros and cons

Tuesday, February 21st, 2012

Google Pregel vs Signal Collect for distributed Graph Processing – pros and cons

René Pickhardt summarizes two of the papers for tomorrow’s meeting on graph databases:

One of the reading club assignments was to read the paper about Google Pregel and Signal Collect, compare them and point out pros and cons of both approaches.

So after I read both papers as well as Claudios overview on Pregel clones and took some notes here are my thoughts but first a short summary of both papers.

What are your thoughts on these or some of the other readings for tomorrow?

Signal/Collect

Saturday, February 18th, 2012

Signal/Collect: a framework for parallel graph processing

I became aware of Signal/Collect because of René Pickhardt’s graph reading club assignment for 22 February 2012.

A paper to use as a starting point for Signal/Collect: Signal/Collect: Graph Algorithms for the (Semantic) Web.

From the code.google.com website (first link above):

Signal/Collect is a programming model and framework for large-scale graph processing. The model is expressive enough to concisely formulate many iterated and data-flow algorithms on graphs, while allowing the framework to transparently parallelize the processing. The current release of the framework is not distributed yet, but this is planned for March 2012.

In Signal/Collect an algorithm is written from the perspective of vertices and edges. Once a graph has been specified the edges will signal and the vertices will collect. When an edge signals it computes a message based on the state of its source vertex. This message is then sent along the edge to the target vertex of the edge. When a vertex collects it uses the received messages to update its state. These operations happen in parallel all over the graph until all messages have been collected and all vertex states have converged.

Many algorithms have very simple and elegant implementations in Signal/Collect. You find more information about the programming model and features in the project wiki. Please take the time to explore some of the example algorithms below.

Signal/Collect development and source code is now on github.

The name of the project is written variously: Signal/Collect, signal collect, signal-collect. Except for when I am quoting other sources, I will be using Signal/Collect.