SIREn: Efficient semi-structured Information Retrieval for Lucene
From the announcement:
Efficient, large scale handling of semi-structured data (including RDF) is increasingly an important issue to many web and enterprise information reuse scenarios.
Querying graph structured data (RDF) is commonly achieved using specific solutions, called triplestores, typically based on DBMS backends. In Sindice we however needed something much more scalable than DBMS and with the desirable features of the typical Web Search engines: top-k query processing, real time updates, full text search, distributed indexes over shards, etc.
While Lucene has long offered these capabilities, its native capabilities are not intended for large semi-structured document collections (or documents with very different schemas). For this reason we developed SIREn – Semantic Information Retrieval Engine – a Lucene plugin to overcome these shortcomings and efficiently index and query RDF, as well as any textual document with an arbitrary amount of metadata fields.
Given its general applicability, we are delighted to release SIREn under the Apache 2.0 open source license. We hope businesses will find SIREn useful in implementing solutions upon the Web of Data.
You can start by looking at the features, review the performance benchmarks, learn more by reading the short tutorial and then download and try SIREn by yourself.
This looks very cool!
It’s tuple processing capabilities in particular!