Russell Jumey summarizes machine learning using Pig at the Hadoop Summit:
Jimmy Lin’s sold out talk about Large Scale Machine Learning at Twitter (paper available) (slides available) described the use of Pig to train machine learning algorithms at scale using Hadoop. Interestingly, learning was achieved using a Pig UDF StoreFunc (documentation available). Some interesting, related work can be found by Ted Dunning on github (source available).
The emphasis isn’t on innovation per se but in using Pig to create workflows that include machine learning on large data sets.
Read in detail for the Pig techniques (which you can reuse elsewhere) and the machine learning examples.