Large-Scale Data Analysis Beyond Map/Reduce by Fabian Hüske.
From the description:
Stratosphere is a joint project by TU Berlin, HU Berlin, and HPI Potsdam and researches “Information Management on the Cloud”. In the course of the project, a massively parallel data processing system is built. The current version of the system consists of the parallel PACT programming model, a database inspired optimizer, and the parallel dataflow processing engine, Nephele. Stratosphere has been released as open source. This talk will focus on the PACT programming model, which is a generalization of Map/Reduce, and show how PACT eases the specification of complex data analysis tasks. At the end of the talk, an overview of Stratosphere’s upcoming release will be given.
In Stratosphere, parallel programming model is separated from the execution engine (unlike Hadoop).
Interesting demonstration of differences between Hadoop versus PACT programming models.
Home: Stratosphere: Above the Clouds
I first saw this at DZone.