Haloop Reported by Jack Park.
From the website:
The growing demand for large-scale data mining and data analysis applications has led both industry and academia to design new types of highly scalable data-intensive computing platforms. MapReduce and Dryad are two popular platforms in which the dataflow takes the form of a directed acyclic graph of operators. However, these new platforms do not have built-in support for iterative programs, which arise naturally in many applications including data mining, web ranking, graph processing, model fitting, and so on.
….
Simply speaking, HaLoop = Ha, Loop:-) HaLoop is a modified version of the Hadoop MapReduce framework, designed to serve these applications. HaLoop not only extends MapReduce with programming support for iterative applications, but also dramatically improves their efficiency by making the task scheduler loop-aware and by adding various caching mechanisms. We evaluate HaLoop on real queries and real datasets and find that, on average, HaLoop reduces query runtimes by 1.85 compared with Hadoop, and shuffles only 4% of the data between mappers and reducers compared with Hadoop.
Interesting project but svn reports the most recent commit was 2010-08-23 and the project wiki reflects the UsersManual was modified on 2010-09-04.
I will follow up with the owner and report back.
*****
Update: 2010-12-26 – Email from the project owner advises of activity not reflected at the project site. Updates to appear in 2011-01. I will probably create another post and link back to this one and forward from this one.