Installing Distributed Solr 4 with Fabric by Martijn Koster
From the post:
Solr 4 has a subset of features that allow it be run as a distributed fault-tolerant cluster, referred to as “SolrCloud”. Installing and configuring Solr on a multi-node cluster can seem daunting when you’re a developer who just wants to give the latest release a try. The wiki page is long and complex, and configuring nodes manually is laborious and error-prone. And while your OS has ZooKeeper/Solr packages, they are probably outdated. But it doesn’t have to be a lot of work: in this post I will show you how to deploy and test a Solr 4 cluster using just a few commands, using mechanisms you can easily adjust for your own deployments.
I am using a cluster consisting of a virtual machines running Ubuntu 12.04 64bit and I am controlling them from my MacBook Pro. The Solr configuration will mimic the Two shard cluster with shard replicas and zookeeper ensemble example from the wiki.
You can run this on AWS EC2, but some special considerations apply, see the footnote.
We’ll use Fabric, a light-weight deployment tool that is basically a Python library to easily execute commands on remote nodes over ssh. Compared to Chef/Puppet it is simpler to learn and use, and because it’s an imperative approach it makes sequential orchestration of dependencies more explicit. Most importantly, it does not require a separate server or separate node-side software installation.
DISCLAIMER: these instructions and associated scripts are released under the Apache License; use at your own risk.
I strongly recommend you use disposable virtual machines to experiment with.
Something to get you excited about the upcoming weekend!
Enjoy!