Beginner Tips For Elastic MapReduce by John Berryman.
From the post:
By this point everyone is well acquainted with the power of Hadoop’s MapReduce. But what you’re also probably well acquainted with is the pain that must be suffered when setting up your own Hadoop cluster. Sure, there are some really good tutorials online if you know where to look:
- here is a great one for setting up a single node cluster
- and here is the equally good follow-on tutorial for setting up a complete cluster
However, I’m not much of a dev ops guy so I decided I’d take a look at Amazon’s Elastic MapReduce (EMR) and for the most part I’ve been very pleased. However, I did run into a couple of difficulties, and hopefully this short article will help you avoid my pitfalls.
I often dream of setting up a cluster that requires a newspaper hat because of the oil from cooling the coils, wait!, that was replica of the early cyclotron, sorry, wrong experiment. 😉
I mean a cluster of computers humming and driving up my cooling bills.
But there are alternatives.
Amazon’s Elastic Map Reduce (EMR) is one.
You can learn Hadoop with Hortonworks Sandbox and when you need production power, EMR awaits.
From a cost effectiveness standpoint, that sounds like a good deal to me.
You?
PS: Someone told me today that Amazon isn’t a reliable cloud because they have downtime. It is true that Amazon does have downtime but that isn’t a deciding factor.
You have to consider the relationship between Amazon’s aggressive pricing and how much reliability you need.
If you are running flight control for a moon launch, you probably should not use a public cloud.
Or for a heart surgery theater. And a few other places like that.
If you mean the webservices for your < 4,000 member NGO, 100% guaranteed uptime is a recipe for someone making money, off of you.