From the post:
We are happy to announce that Cascading 2.0 is now publicly available for download.
This release includes a number of new features. Specifically:
- Apache 2.0 Licensing
- Support for Hadoop 1.0.2
- Local and Hadoop planner modes, where local runs in memory without Hadoop dependencies
- HashJoin pipe for “map side joins”
- Merge pipe for “map side merges”
- Simple Checkpointing for capturing intermediate data as a file
- Improved Tap and Scheme APIs
We have also created a new top-level project on GitHub for all community sponsored Cascading projects:
From the documentation:
What is Cascading?
Cascading is a data processing API and processing query planner used for defining, sharing, and executing data-processing workflows on a single computing node or distributed computing cluster. On a single node, Cascading’s “local mode” can be used to efficiently test code and process local files before being deployed on a cluster. On a distributed computing cluster using Apache Hadoop platform, Cascading adds an abstraction layer over the Hadoop API, greatly simplifying Hadoop application development, job creation, and job scheduling.
Don’t miss the extensions to Cascading: Cascading Extensions. Any summary would be unfair. Take a look for yourself. Coverage of any of these you would like to point out?
I first spotted Cascading 2.0 at Alex Popescu’s myNoSQL.