Avoiding Cluster-Scale Headaches with Better Tools for Data Quality and Feature Engineering by Ted Willke.
Ted’s second slide reads:
Machine Learning may nourish the soul…
…but Data Preparation will consume it.
Ted starts off talking about the problems of data preparation but fairly quickly focuses in on property graphs and using Pig ETL.
He also outlines outstanding problems with Pig ETL (slides 29-32).
Nothing surprising but good news that Graph Builder 2 Alpha is due out in Dec’ 13.
BTW, GraphBuilder 1.0 can be found at: https://01.org/graphbuilder/