Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

July 9, 2013

The Blur Project: Marrying Hadoop with Lucene

Filed under: Hadoop,Lucene — Patrick Durusau @ 3:40 pm

The Blur Project: Marrying Hadoop with Lucene by Aaron McCurry.

From the post:

Blur is an Apache Incubator project that provides distributed search functionality on top of Apache Hadoop, Apache Lucene, Apache ZooKeeper, and Apache Thrift. When I started building Blur three years ago, there wasn’t a search solution that had a solid integration with the Hadoop ecosystem. Our initial needs were to be able to index our data using MapReduce, store indexes in HDFS, and serve those indexes from clusters of commodity servers while remaining fault tolerant. Blur was built specifically for Hadoop — taking scalability, redundancy, and performance into consideration from the very start — while leveraging all the great features that already exist in the Hadoop stack.

(…)

Blur was initially released on Github as an Apache Licensed project and was then accepted into the Apache Incubator project in July 2012, with Patrick Hunt as its champion. Since then, Blur as a software project has matured and become much more stable. One of the major milestones over the past year has been the upgrade to Lucene 4, which has brought many new features and massive performance gains.

Recently there has been some interest in folding some of Blur’s code (HDFSDirectory and BlockCache) back into the Lucene project for others to utilize. This is an exciting development that legitimizes some of the approaches that we have taken to date. We are in conversations with some members of the Lucene community, such as Mark Miller, to figure out how we can best work together to benefit both the fledgling Blur project as well as the much larger and more well known/used Lucene project.

Blur’s community is small but growing. Our project goals are to continue to grow our community and graduate from the Incubator project. Our technical goals are to continue to add features that perform well at scale while maintaining the fault tolerance that is required of any modern distributed system.

We welcome your contributions at http://incubator.apache.org/blur/!

Another exciting Apache project that needs contributors!

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress