After listening to Kathleen Ting (Cloudera) describe how 44% of support tickets for the Hadoop ecosystem arise from misconfiguration (Dealing with Data in the Hadoop Ecosystem…), I started to wonder how many opportunities there are for misconfiguration in the Hadoop ecosystem?
That’s probably not an answerable question, but we can look at how configurations are documented in the Hadoop ecosystem:
Comment in the Hadoop ecosystem:
- Accumulo – XML <!– comment –>
- Avro – Schemas defined in JSON (no comment facility)
- Cassandra – “#” comment indicator
- Chukwa – XML <!– comment –>
- Falcon – XML <!– comment –>
- Flume – “#” comment indicator
- Hadoop – XML <!– comment –>
- Hama – XML <!– comment –>
- HBase – XML <!– comment –>
- Hive – XML <!– comment –>
- Knox – XML <!– comment –>
- Mahout – XML <!– comment –>
- PIG – C style comments
- Sqoop – “#” comment indicator
- Tex – XML <!– comment –>
- ZooKeeper – text but no apparent ability to comment (Zookeeper Administrator’s Guide)
I read that to mean:
1 Component, Pig uses C style comments
2 Components, Avro and ZooKeeper, have no ability for comments at all.
3 Components, Cassandra, Flume and Sqoop use “#” for comments
10 Components, Accumulo, Chukwa, Falcon, Hama, Hadoop, HBase, Hive, Knox, Mahout and Tex presumably support XML comments
A full one third of the Hadoop ecosystem uses a non-XML comments, if comments are permitted at all. The other two-thirds of the ecosystem uses XML comments in some files and not others.
The entire ecosystem lacks a standard way to associate value or settings in one component with values or settings in another component.
To say nothing of associating values or settings with releases of different components.
Without looking at the details of the possible settings for each component, does that seem problematic to you?
[…] knew I was missing one or more Hadoop ecosystem components yesterday! Hadoop Ecosystem Configuration Woes? I left Hue out but also some […]
Pingback by Sqooping Data with Hue « Another Word For It — November 8, 2013 @ 4:47 pm
[…] It was just yesterday that I was writing about configuration issues in the Hadoop ecosystem, that includes Zookeeper. Hadoop Ecosystem Configuration Woes? […]
Pingback by OrientDB becomes distributed… « Another Word For It — November 8, 2013 @ 5:21 pm
[…] Radu’s post reminds me I over looked logs in the Hadoop eco-system when describing semantic diversity (Hadoop Ecosystem Configuration Woes?). […]
Pingback by Using Solr to Search and Analyze Logs « Another Word For It — November 12, 2013 @ 4:07 pm
[…] Dated, 2011, but illustrates some of the issues I raised in: Hadoop Ecosystem Configuration Woes? […]
Pingback by Setting up a Hadoop cluster « Another Word For It — November 21, 2013 @ 6:36 pm