Apache Hadoop 2.3.0 Released! by Arun Murthy.
From the post:
It gives me great pleasure to announce that the Apache Hadoop community has voted to release Apache Hadoop 2.3.0!
hadoop-2.3.0 is the first release for the year 2014, and brings a number of enhancements to the core platform, in particular to HDFS.
With this release, there are two significant enhancements to HDFS:
- Support for Heterogeneous Storage Hierarchy in HDFS (HDFS-2832)
- In-memory Cache for data resident in HDFS via Datanodes (HDFS-4949)
With support for heterogeneous storage classes in HDFS, we now can take advantage of different storage types on the same Hadoop clusters. Hence, we can now make better cost/benefit tradeoffs with different storage media such as commodity disks, enterprise-grade disks, SSDs, Memory etc. More details on this major enhancement are available here.
Along similar lines, it is now possible to use memory available in the Hadoop cluster to centrally cache and administer data-sets in-memory in the Datanode’s address space. Applications such as MapReduce, Hive, Pig etc. can now request for memory to be cached (for the curios, we use a combination of mmap, mlock to achieve this) and then read it directly off the Datanode’s address space for extremely efficient scans by avoiding disk altogether. As an example, Hive is taking advantage of this feature by implementing an extremely efficient zero-copy read path for ORC files – see HIVE-6347 for details.
…
See Arun’s post for more details.
I guess there really is a downside to open source development.
It’s so much faster than commercial product development cycles. 😉 (Hard to keep up.)