Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

May 25, 2012

Apache Hadoop 2.0 (Alpha) Released

Filed under: Hadoop,HDFS,MapReduce — Patrick Durusau @ 6:15 pm

Apache Hadoop 2.0 (Alpha) Released by Arun Murthy.

From the post:

As the release manager for the Apache Hadoop 2.0 release, it gives me great pleasure to share that the Apache Hadoop community has just released Apache Hadoop 2.0.0 (alpha)! While only an alpha release (read: not ready to run in production), it is still an important step forward as it represents the very first release that delivers new and important capabilities, including:

In addition to these new capabilities, there are several planned enhancements that are on the way from the community, including HDFS Snapshots and auto-failover for HA NameNode, along with further improvements to the stability and performance with the next generation of MapReduce (YARN). There are definitely good times ahead.

Let the good times roll!

February 14, 2012

Cloudera Manager | Service and Configuration Management Demo Videos

Filed under: Cloudera,Hadoop,HBase,HDFS,MapReduce — Patrick Durusau @ 5:11 pm

Cloudera Manager | Service and Configuration Management Demo Videos by Jon Zuanich.

From the post:

Service and Configuration Management (Part I & II)

We’ve recently recorded a series of demo videos intended to highlight the extensive set of features and functions included with Cloudera Manager, the industry’s first end-to-end management application for Apache Hadoop. These demo videos showcase the newly enhanced Cloudera Manager interface and reveal how to use this powerful application to simplify the administration of Hadoop clusters, optimize performance and enhance the quality of service.

In the first two videos of this series, Philip Langdale, a software engineer at Cloudera, walks through Cloudera Manager’s Service and Configuration Management module. He demonstrates how simple it is to set up and configure the full range of Hadoop services in CDH (including HDFS, MR and HBase); enable security; perform configuration rollbacks; and add, delete and decommission nodes.

Interesting that Vimeo detects the “embedding” of these videos in my RSS reader and displays a blocked message. At the Cloudera site, all is well.

Management may not be as romantic as the latest graph algorithms but it is a pre-condition to widespread enterprise adoption.

Introducing CDH4

Filed under: Cloudera,Hadoop,HBase,HDFS,MapReduce — Patrick Durusau @ 5:10 pm

Introducing CDH4 by Charles Zedlewski.

From the post:

I’m pleased to inform our users and customers that Cloudera has released its 4th version of Cloudera’s Distribution Including Apache Hadoop (CDH) into beta today. This release combines the input from our enterprise customers, partners and users with the hard work of Cloudera engineering and the larger Apache open source community to create what we believe is a compelling advance for this widely adopted platform.

There are a great many improvements and new capabilities in CDH4 compared to CDH3. Here is a high level list of what’s available for you to test in this first beta release:

  • Availability – a high availability namenode, better job isolation, hard drive failure handling, and multi-version support
  • Utilization – multiple namespaces, co-processors and a slot-less resource management model
  • Performance – improvements in HBase, HDFS, MapReduce and compression performance
  • Usability – broader BI support, expanded API access, unified file formats & compression codecs
  • Security – scheduler ACL’s

Some items of note about this beta:

This is the first beta for CDH4. We plan to do a second beta some weeks after the first beta. The second beta will roll in updates to Apache Flume, Apache Sqoop, Hue, Apache Oozie and Apache Whirr that did not make the first beta. It will also broaden the platform support back out to our normal release matrix of Red Hat, Centos, Suse, Ubuntu and Debian. Our plan is for this second beta to have the last significant component changes before CDH goes GA.

Some CDH components are getting substantial revamps and we have transition plans for these. There is a significantly redesigned MapReduce (aka MR2) with a similar API to the old MapReduce but with new daemons, user interface and more. MR2 is part of CDH4, but we also decided it makes sense to ship with the MapReduce from CDH3 which is widely used, thoroughly debugged and stable. We will support both generations of MapReduce for the life of CDH4, which will allow customers and users to take advantage of all of the new CDH4 features while making the transition to the new MapReduce in a timeframe that makes sense for them.

The only better time to be in data mining, information retrieval, data analysis is next week. 😉

November 20, 2011

Jeff Hammerbacher on Experiences Evolving a New Analytical Platform

Filed under: Crunch,Dremel,Dryad,Flume,Giraph,HBase,HDFS,Hive,JDBC,MapReduce,ODBC,Oozie,Pregel — Patrick Durusau @ 4:21 pm

Jeff Hammerbacher on Experiences Evolving a New Analytical Platform

Slides from Jeff’s presentation and numerous references, including to a live blogging summary by Jeff Dalton.

In terms of the new analytical platform, I would strongly suggest that you take Cloudera’s substrate:

Cloudera starts with a substrate architecture of Open Compute commodity Linux servers configured using Puppet and Chef and coordinated using ZooKeeper. Naturally this entire stack is open-source. They use HFDS and Ceph to provide distributed, schema-less storage. They offer append-only table storage and metadata using Avro, RCFile, and HCatalog; and mutable table storage and metadata using HBase. For computation, they offer YARN (inter-job scheduling, like Grid Engine, for data intensive computing) and Mesos for cluster resource management; MapReduce, Hamster (MPI), Spark, Dryad / DryadLINQ, Pregel (Giraph), and Dremel as processing frameworks; and Crunch (like Google’s FlumeJava), PigLatin, HiveQL, and Oozie as high-level interfaces. Finally, Cloudera offers tool access through FUSE, JDBC, and ODBC; and data ingest through Sqoop and Flume.

Rather than asking the usual questions, how to make this faster, more storage, etc., all of which are important, ask the more difficult questions:

  1. In or between which of these elements, would human analysis/judgment have the greatest impact?
  2. Would human analysis/judgment be best made by experts or crowds?
  3. What sort of interface would elicit the best human analysis/judgment? (visual/aural; contest/game/virtual)
  4. Performance with feedback or homeostasis mechanisms?

That is a very crude and uninformed starter set of questions.

Putting higher speed access to more data with better tools at our fingertips expands the questions we can ask of interfaces and our interaction with the data. (Before we ever ask questions of the data.)

« Newer Posts

Powered by WordPress