Spring for Hadoop simplifies application development
From the post:
After almost exactly a year of development, SpringSource has released Spring for Hadoop 1.0 with the goal of making the development of Hadoop applications easier for users of the distributed application framework. VMware engineer Costin Leau said in the release announcement that the company has often seen developers use the out-of-the-box tools that come with Hadoop in ways that lead to a “poorly structured collection of command line utilities, scripts and pieces of code stitched together.” Spring for Hadoop aims to change this by applying the Template API design pattern from Spring to Hadoop.
This application gives helper classes such as HBaseTemplate, HiveTemplate and PigTemplate which interface with the different parts of the Hadoop ecosystem, Java-centric APIs such as Cascading can also be used with or without additional configuration. The software enables Spring functionality such as thread-safe access to lower level resources and lightweight object mapping in Hadoop applications. Leau also says that Spring for Hadoop is designed to allow projects to grow organically. To do this, users can mix and match various runner classes for scripts and, as the complexity of the application increases, developers can migrate to Spring Batch and manage these processes through a REST-based API.
Spring for Hadoop 1.0 is available from the SpringSource web site under the Apache 2.0 License. The developers say they are testing the software daily against various Hadoop 1.x distributions such as Apache Hadoop and Greenplum HD, as well as Cloudera CDH3 and CDH4. Greenplum HD already includes Spring for Hadoop in its distribution. Support for Hadoop 2.x is expected “in the near future”.
I’m going to leave characterization of present methods of working with Hadoop for others. 😉