From the post:
MapR Technologies released a big data toolkit, based on Apache Hadoop with their own distributed storage alternative to HDFS. The software is commercial, with MapR offering both a free version, M3, as well as a paid version, M5. M5 includes snapshots and mirroring for data, Job Tracker recovery, and commercial support. MapR’s M5 edition will form the basis of EMC Greenplum’s upcoming HD Enterprise Edition, whereas EMC Greenplum’s HD Community Edition will be based on Facebook’s Hadoop distribution rather than MapR technology.
At the Hadoop Summit last week, MapR Technologies announced the general availability of their "Next Generation Distribution for Apache Hadoop." InfoQ interviewed CEO John Schroeder and VP Marketing Jack Norris to learn more about their approach. MapR claims to improve MapReduce and HBase performance by a factor of 2-5, and to eliminate single points of failure in Hadoop. Schroeder says that they measure performance against competing distributions by timing benchmarks such as DFSIO, Terasort, YCSB, Gridmix, and Pigmix. He also said that customers testing MapR’s technology are seeing a 3-5 times improvement in performance against previous versions of Hadoop that they use. Schroeder reports that they had 35 beta testers and that they showed linear scalability in clusters of up to 160 nodes. MapR reports that several of the beta test customers now have their technology in production – including one that has a 140 node cluster in production, and another that "is looking at deploying MapR on 2000 nodes." By comparison, Yahoo is believed to run the largest Hadoop clusters, comprised of 4000 nodes running Apache Hadoop and competitor Cloudera claimed to have more than 80 customers running Hadoop in production in March 2011, with 22 clusters running Cloudera’s distribution that are over a petabyte as of July 2011.
Remember, Hadoop is a buzz word in U.S. government circles.