From the post:
But Pig is more than that. Pig has emerged as the ‘duct tape’ of Big Data, enabling you to send data between distributed systems in a few lines of code. In this series, we’re going to show you how to use Hadoop and Pig to connect different distributed systems to enable you to process data from wherever and to wherever you like.
Working code for this post as well as setup instructions for the tools we use and their environment variables are available at https://github.com/rjurney/enron-python-flask-cassandra-pig and you can download the Enron emails we use in the example in Avro format at http://s3.amazonaws.com/rjurney.public/enron.avro. You can run our example Pig scripts in local mode (without Hadoop) with the -x local flag: pig -x local. This enables new Hadoop users to try out Pig without a Hadoop cluster.
With this post in the series, “duct tape,” made it into the title.
In case you don’t know (I didn’t), Flask is a “lightweight web application framework in Python.”
Just once I would like to see a “heavyweight, cumbersome, limited and annoying web application framework in (insert language of your choice).”
Just for variety.
Rather than characterizing software, say what it does.
Sorry, I have been converting one of the most poorly edited documents I have ever seen into a csv file. Proofing will follow the conversion process but hope to finish that by the end of next week.