Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

September 26, 2011

Building Distributed Indexing for Solr: MurmurHash3 for Java

Filed under: Indexing,Java,Solr — Patrick Durusau @ 7:01 pm

Building Distributed Indexing for Solr: MurmurHash3 for Java by Yonik Seeley.

From the post:

Background

I needed a really good hash function for the distributed indexing we’re implementing for Solr. Since it will be used for partitioning documents, it needed to be really high quality (well distributed) since we don’t want uneven shards. It also needs to be cross-platform, so a client could calculate this hash value themselves if desired, to predict which node has a given document.

MurmurHash3

MurmurHash3 is one of the top favorite new hash function these days, being both really fast and of high quality. Unfortunately it’s written in C++, and a quick google did not yield any suitable high quality port. So I took 15 minutes (it’s small!) to port the 32 bit version, since it should be faster than the other versions for small keys like document ids. It works in 32 bit chunks and produces a 32 bit hash – more than enough for partitioning documents by hash code.

Something for your Solr friends.

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress