Wukong, Bringing Ruby to Hadoop
From the post:
Wukong is hands down the simplest (and probably the most fun) tool to use with hadoop. It especially excels at the following use case:
You’ve got a huge amount of data (let that be whatever size you think is huge). You want to perform a simple operation on each record. For example, parsing out fields with a regular expression, adding two fields together, stuffing those records into a data store, etc etc. These are called map only jobs. They do NOT require a reduce. Can you imagine writing a java map reduce program to add two fields together? Wukong gives you all the power of ruby backed by all the power (and parallelism) of hadoop streaming. Before we get into examples, and there will be plenty, let’s make sure you’ve got wukong installed and running locally.
Authoring a topic map is more than the final act of assembling the topic map. Any number of pre-assembly steps may be necessary before the final steps. Wukong is one more tool to assist in that process.