From Andreas Harth and Günter Ladwig:
[W]e are happy to announce the first public release of CumulusRDF, a Linked Data server that uses Apache Cassandra [1] as a cloud-based storage backend. CumulusRDF provides a simple HTTP interface [2] to manage RDF data stored in an Apache Cassandra cluster.
Features
* By way of Apache Cassandra, CumulusRDF provides distributed, fault-tolerant and elastic RDF storage
* Supports Linked Data and triple pattern lookups
* Proxy mode: CumulusRDF can act as a proxy server [3] for other Linked Data applications, allowing to deploy any RDF dataset as Linked DataThis is a first beta release that is still somewhat rough around the edges, but the basic functionality works well. The HTTP interface is work-in-progress. Eventually, we plan to extend the storage model to support quads.
CumulusRDF is available from http://code.google.com/p/cumulusrdf/
See http://code.google.com/p/cumulusrdf/wiki/GettingStarted to get started using CumulusRDF.
There is also a paper [4] on CumulusRDF that I presented at the Scalable Semantic Knowledge Base Systems (SSWS) workshop at ISWC last week.
Cheers,
Andreas Harth and Günter Ladwig[1] http://cassandra.apache.org/
[2] http://code.google.com/p/cumulusrdf/wiki/HttpInterface
[3] http://code.google.com/p/cumulusrdf/wiki/ProxyMode
[4] http://people.aifb.kit.edu/gla/cumulusrdf/cumulusrdf-ssws2011.pdf
Everybody knows I hate to be picky but the abstract of [4] promises:
Results on a cluster of up to 8 machines indicate that CumulusRDF is competitive to state-of-the-art distributed RDF stores.
But I didn’t see any comparison to “state-of-the-art” RDF stores, distributed or not. Did I just overlook something?
I ask because I think this approach has promise, at least as an exploration of indexing strategies for RDF and how usage scenarios may influence those strategies. But that will be difficult to evaluate in the absence of comparison to less imaginative approaches to RDF indexing.