Archive for the ‘Trinity’ Category

A Distributed Graph Engine…

Friday, March 22nd, 2013

A Distributed Graph Engine for Web Scale RDF Data by Kai Zeng, Jiacheng Yang, Haixum Wang, Bin Shao and Zhongyuan Wang.

Abstract:

Much work has been devoted to supporting RDF data. But state-of-the-art systems and methods still cannot handle web scale RDF data e ffectively. Furthermore, many useful and general purpose graph-based operations (e.g., random walk, reachability, community discovery) on RDF data are not supported, as most existing systems store and index data in particular ways (e.g., as relational tables or as a bitmap matrix) to maximize one particular operation on RDF data: SPARQL query processing. In this paper, we introduce Trinity.RDF, a distributed, memory-based graph engine for web scale RDF data. Instead of managing the RDF data in triple stores or as bitmap matrices, we store RDF data in its native graph form. It achieves much better (sometimes orders of magnitude better) performance for SPARQL queries than the state-of-the-art approaches. Furthermore, since the data is stored in its native graph form, the system can support other operations (e.g., random walks, reachability) on RDF graphs as well. We conduct comprehensive experimental studies on real life, web scale RDF data to demonstrate the e ffectiveness of our approach.

From the conclusion:

We propose a scalable solution for managing RDF data as graphs in a distributed in-memory key-value store. Our query processing and optimization techniques support SPARQL queries without relying on join operations, and we report performance numbers of querying against RDF datasets of billions of triples. Besides scalability, our approach also has the potential to support queries and analytical tasks that are far more advanced than SPARQL queries, as RDF data is stored as graphs. In addition, our solution only utilizes basic (distributed) key-value store functions and thus can be ported to any in-memory key-value store.

A result that is:

  • scalable
  • goes beyond SPARQL
  • can be ported to any in-memory key-value store

Merits a very close read.

Makes me curious what other data models would work better if cast as graphs?

I first saw this in a tweet by Juan Sequeda.

Efficient Subgraph Matching on Billion Node Graphs [Parallel Graph Processing]

Sunday, September 2nd, 2012

Efficient Subgraph Matching on Billion Node Graphs by Zhao Sun (Fudan University, China), Hongzhi Wang (Harbin Institute of Technology, China), Haixun Wang (Microsoft Research Asia, China), Bin Shao (Microsoft Research Asia, China) and Jianzhong Li (Harbin Institute of Technology, China).

Abstract:

The ability to handle large scale graph data is crucial to an increasing number of applications. Much work has been dedicated to supporting basic graph operations such as subgraph matching, reachability, regular expression matching, etc. In many cases, graph indices are employed to speed up query processing. Typically, most indices require either super-linear indexing time or super-linear indexing space. Unfortunately, for very large graphs, super-linear approaches are almost always infeasible. In this paper, we study the problem of subgraph matching on billion-node graphs. We present a novel algorithm that supports efficient subgraph matching for graphs deployed on a distributed memory store. Instead of relying on super-linear indices, we use efficient graph exploration and massive parallel computing for query processing. Our experimental results demonstrate the feasibility of performing subgraph matching on web-scale graph data.

Did you say you were interested in parallel graph processing?

This paper and the materials cited in the bibliography make a nice introduction to the current options for graph processing.

I first saw this at Alex Popescu’s myNoSQL, citing it from the VLDB proceedings.

With the DBLP enhanced version of the VLDB proceedings, VLDB 2012 Ice Breaker v0.1, DBLP links for the authors were easy.

NoSQL Paper: The Trinity Graph Engine

Wednesday, March 14th, 2012

NoSQL Paper: The Trinity Graph Engine.

Alex Popescu of myNoSQL has discovered a paper on the MS Trinity Graph Engine.

There hasn’t been a lot of information on it so this could be helpful.

Thanks Alex!

Trinity Podcast

Friday, August 26th, 2011

Microsoft Research: Trinity is a Graph Database and a Distributed Parallel Platform for Graph Data

Episode of Hanselminutes, a weekly audio talk show with noted web developer and technologist Scott Hanselman and hosted by Carl Franklin.

Scott talks via Skype to Haixun Wang at Microsoft Research Asia about Trinity: a distributed graph database and computing platform. What is a GraphDB? How is it different from a traditional Relational DB, a Document DB or even just a naive in-memory distributed data structure? Will your next database be a graph database?

The interview is quite entertaining and leaving the booster comments aside, is quite informative.

Relational database vendors may be surprised to hear their products described as good for “small data.” In their defense (as if they needed one), I would note there is a lot of money to be made in “small data.”

For further information see the Tinity project homepage at Microsoft Research.

Trinity code is available only for internal release, 🙁 , but you can look at the Trinity Manual.