Archive for the ‘Zookeeper’ Category

OrientDB becomes distributed…

Friday, November 8th, 2013

OrientDB becomes distributed using Hazelcast, leading open source in-memory data grid

From the post:

Hazelcast and Orient Technologies today announced that OrientDB has gained a multi-master replication feature powered by Hazelcast.

Clustering multiple server nodes is the most significant feature of OrientDB 1.6. Databases can be replicated across heterogeneous server nodes in multi-master mode achieving the best of scalability and performance.

“I think one of the added value of OrientDB against all the NoSQL products is the usage of Hazelcast while most of the others use Yahoo ZooKeeper to manage the cluster (discovery, split brain network, etc) and something else for the transport layer.” said Luca Garulli, CEO of Orient Technologies. “With ZooKeeper configuration is a nightmare, while Hazelcast let you to add OrientDB servers with ZERO configuration. This has been a big advantage for our clients and everything is much more ‘elastic’, specially when deployed on the Cloud. We’ve used Hazelcast not only for the auto-discovery, but also for the transport layer. Thanks to this new architecture all our clients can scale up horizontally by adding new servers without stopping or reconfigure the cluster”.

“We are amazed by the speed with which OrientDB has adopted Hazelcast and we are delighted to see such excellent technologists teaming up with Hazelcast.” said Talip Ozturk, CEO of Hazelcast. “We work hard to make the best open source in-memory data grid on the market and are happy to see it being used in this way.” (emphasis added)

It was just yesterday that I was writing about configuration issues in the Hadoop ecosystem, that includes Zookeeper. Hadoop Ecosystem Configuration Woes?

Where there is smoke, is there fire?

Hadoop Weekly – October 28, 2013

Tuesday, October 29th, 2013

Hadoop Weekly – October 28, 2013 by Joe Crobak.

A weekly blog post that tracks all things in the Hadoop ecosystem.

I will keep posting on Hadoop things of particular interest for topic maps but will also be pointing to this blog for those who want/need more Hadoop coverage.

Apache Bigtop: The “Fedora of Hadoop”…

Wednesday, June 26th, 2013

Apache Bigtop: The “Fedora of Hadoop” is Now Built on Hadoop 2.x by Roman Shaposhnik.

From the post:

Just in time for Hadoop Summit 2013, the Apache Bigtop team is very pleased to announce the release of Bigtop 0.6.0: The very first release of a fully integrated Big Data management distribution built on the currently most advanced Hadoop 2.x, Hadoop 2.0.5-alpha.

Bigtop, as many of you might already know, is a project aimed at creating a 100% open source and community-driven Big Data management distribution based on Apache Hadoop. (You can learn more about it by reading one of our previous blog posts on Apache Blogs.) Bigtop also plays an important role in CDH, which utilizes its packaging code from Bigtop — Cloudera takes pride in developing open source packaging code and contributing the same back to the community.

The very astute readers of this blog will notice that given our quarterly release schedule, Bigtop 0.6.0 should have been called Bigtop 0.7.0. It is true that we skipped a quarter. Our excuse is that we spent all this extra time helping the Hadoop community stabilize the Hadoop 2.x code line and making it a robust kernel for all the applications that are now part of the Bigtop distribution.

And speaking of applications, we haven’t forgotten to grow the Bigtop family: Bigtop 0.6.0 adds Apache HCatalog and Apache Giraph to the mix. The full list of Hadoop applications available as part of the Bigtop 0.6.0 release is:

  • Apache Zookeeper 3.4.5
  • Apache Flume 1.3.1
  • Apache HBase 0.94.5
  • Apache Pig 0.11.1
  • Apache Hive 0.10.0
  • Apache Sqoop 2 (AKA 1.99.2)
  • Apache Oozie 3.3.2
  • Apache Whirr 0.8.2
  • Apache Mahout 0.7
  • Apache Solr (SolrCloud) 4.2.1
  • Apache Crunch (incubating) 0.5.0
  • Apache HCatalog 0.5.0
  • Apache Giraph 1.0.0
  • LinkedIn DataFu 0.0.6
  • Cloudera Hue 2.3.0

And we were just talking about YARN and applications weren’t we? 😉

Enjoy!

(Participate if you can but at least send a note of appreciation to Cloudera.)

ZooKeeper 3.4.4 is Now Available

Monday, September 24th, 2012

ZooKeeper 3.4.4 is Now Available by Mahadev Konar.

From the post:

Apache ZooKeeper release 3.4.4 is now available. This is a bug fix release including 50 bug fixes. Following is a summary of the critical issues fixed in the release.

Cool!

New ‘The Future of Apache Hadoop’ Season!

Wednesday, September 5th, 2012

OK, the real title is: Four New Installments in ‘The Future of Apache Hadoop’ Webinar Series

From the post:

During the ‘Future of Apache Hadoop’ webinar series, Hortonworks founders and core committers will discuss the future of Hadoop and related projects including Apache Pig, Apache Ambari, Apache Zookeeper and Apache Hadoop YARN.

Apache Hadoop has rapidly evolved to become the leading platform for managing, processing and analyzing big data. Consequently there is a thirst for knowledge on the future direction for Hadoop related projects. The Hortonworks webinar series will feature core committers of the Apache projects discussing the essential components required in a Hadoop Platform, current advances in Apache Hadoop, relevant use-cases and best practices on how to get started with the open source platform. Each webinar will include a live Q&A with the individuals at the center of the Apache Hadoop movement.

Coming to a computer near you:

  • Pig Out on Hadoop (Alan Gates): Wednesday, September 12 at 10:00 a.m. PT / 1:00 p.m. ET
  • Deployment and Management of Hadoop Clusters with Ambari (Matt Foley): Wednesday, September 26 at 10:00 a.m. PT / 1:00 p.m. ET
  • Scaling Apache Zookeeper for the Next Generation of Hadoop Applications (Mahadev Konar): Wednesday, October 17 at 10:00 a.m. PT / 1:00 p.m. ET
  • YARN: The Future of Data Processing with Apache Hadoop ( Arun C. Murthy): Wednesday, October 31 at 10:00 a.m. PT / 1:00 p.m. ET

Registration is open so get it on your calendar!

Hortonworks Data Platform v1.0 Download Now Available

Thursday, June 21st, 2012

Hortonworks Data Platform v1.0 Download Now Available

From the post:

If you haven’t yet noticed, we have made Hortonworks Data Platform v1.0 available for download from our website. Previously, Hortonworks Data Platform was only available for evaluation for members of the Technology Preview Program or via our Virtual Sandbox (hosted on Amazon Web Services). Moving forward and effective immediately, Hortonworks Data Platform is available to the general public.

Hortonworks Data Platform is a 100% open source data management platform, built on Apache Hadoop. As we have stated on many occasions, we are absolutely committed to the Apache Hadoop community and the Apache development process. As such, all code developed by Hortonworks has been contributed back to the respective Apache projects.

Version 1.0 of Hortonworks Data Platform includes Apache Hadoop-1.0.3, the latest stable line of Hadoop as defined by the Apache Hadoop community. In addition to the core Hadoop components (including MapReduce and HDFS), we have included the latest stable releases of essential projects including HBase 0.92.1, Hive 0.9.0, Pig 0.9.2, Sqoop 1.4.1, Oozie 3.1.3 and Zookeeper 3.3.4. All of the components have been tested and certified to work together. We have also added tools that simplify the installation and configuration steps in order to improve the experience of getting started with Apache Hadoop.

I’m a member of the general public! And you probably are too! 😉

See the rest of the post for more goodies that are included with this release.

Apache Bigtop 0.3.0 (incubating) has been released

Wednesday, April 4th, 2012

Apache Bigtop 0.3.0 (incubating) has been released by Roman Shaposhnik.

From the post:

Apache Bigtop 0.3.0 (incubating) is now available. This is the first fully integrated, community-driven, 100% Apache Big Data management distribution based on Apache Hadoop 1.0. In addition to a major change in the Hadoop version, all of the Hadoop ecosystem components have been upgraded to the latest stable versions and thoroughly tested:

  • Apache Hadoop 1.0.1
  • Apache Zookeeper 3.4.3
  • Apache HBase 0.92.0
  • Apache Hive 0.8.1
  • Apache Pig 0.9.2
  • Apache Mahout 0.6.1
  • Apache Oozie 3.1.3
  • Apache Sqoop 1.4.1
  • Apache Flume 1.0.0
  • Apache Whirr 0.7.0

Thoughts on what is missing from this ecosystem?

What if you moved from the company where you wrote the scripts? And they needed new scripts?

Re-write? On what basis?

Is your “big data” big enough to need “big documentation?”

Apache ZooKeeper 3.3.5 has been released

Saturday, March 24th, 2012

Apache ZooKeeper 3.3.5 has been released by Patrick Hunt.

From the post:

Apache ZooKeeper release 3.3.5 is now available. This is a bug fix release covering 11 issues, two of which were considered blockers. Some of the more serious issues include:

  • ZOOKEEPER-1367 Data inconsistencies and unexpired ephemeral nodes after cluster restart
  • ZOOKEEPER-1412 Java client watches inconsistently triggered on reconnect
  • ZOOKEEPER-1277 Servers stop serving when lower 32bits of zxid roll over
  • ZOOKEEPER-1309 Creating a new ZooKeeper client can leak file handles
  • ZOOKEEPER-1389 It would be nice if start-foreground used exec $JAVA in order to get rid of the intermediate shell process
  • ZOOKEEPER-1089 zkServer.sh status does not work due to invalid option of nc

Stability, Compatibility and Testing

3.3.5 is a stable release that’s fully backward compatible with 3.3.4. Only bug fixes relative to 3.3.4 have been applied. Version 3.3.5 will be incorporated into the upcoming CDH3U4 release.

Just in case you are curious, ZOOKEEPER-1367 and ZOOKEEPER-1412 were the blocking issues. I would have thought leaking file handles (ZOOKEEPER-1309) would be as well. It’s fixed now but I am curious about the basis for classification of issues. (Not entirely academic since the ODF TC uses JIRA, after a fashion, to track issues with standard revision.)

Apache ZooKeeper 3.4.3 has been released

Thursday, February 16th, 2012

Apache ZooKeeper 3.4.3 has been released by Patrick Hunt.

From the post:

Apache ZooKeeper release 3.4.3 is now available. This is a bug fix release covering 18 issues, one of which was considered a blocker.

ZooKeeper 3.4 is incorporated into CDH4 and now available in beta 1!

ZOOKEEPER-1367 is the most serious of the issues addressed, it could cause data corruption on restart. This version also adds support for compiling the client on ARM architectures.

  • ZOOKEEPER-1367 Data inconsistencies and unexpired ephemeral nodes after cluster restart
  • ZOOKEEPER-1343 getEpochToPropose should check if lastAcceptedEpoch is greater or equal than epoch
  • ZOOKEEPER-1373 Hardcoded SASL login context name clashes with Hadoop security configuration override
  • ZOOKEEPER-1089 zkServer.sh status does not work due to invalid option of nc
  • ZOOKEEPER-973 bind() could fail on Leader because it does not setReuseAddress on its ServerSocket
  • ZOOKEEPER-1374 C client multi-threaded test suite fails to compile on ARM architectures.
  • ZOOKEEPER-1348 Zookeeper 3.4.2 C client incorrectly reports string version of 3.4.1

If you are running 3.4.2 or earlier, be sure to upgrade immediately. See my earlier post for details on what’s new in 3.4.

From the Apache ZooKeeper homepage:

ZooKeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services. All of these kinds of services are used in some form or another by distributed applications. Each time they are implemented there is a lot of work that goes into fixing the bugs and race conditions that are inevitable. Because of the difficulty of implementing these kinds of services, applications initially usually skimp on them ,which make them brittle in the presence of change and difficult to manage. Even when done correctly, different implementations of these services lead to management complexity when the applications are deployed.

Just in case you hope to manage distributed applications some day. Zookeeper should be on your resume.

Apache Zookeeper 3.3.4

Wednesday, November 30th, 2011

Apache Zookeeper 3.3.4

From the post:

Apache ZooKeeper release 3.3.4 is now available: this is a fix release covering 22 issues, 9 of which were considered blockers. Some of the more serious issues include:

  • ZOOKEEPER-1208 Ephemeral nodes may not be removed after the client session is invalidated
  • ZOOKEEPER-961 Watch recovery fails after disconnection when using chroot connection option
  • ZOOKEEPER-1049 Session expire/close flooding renders heartbeats to delay significantly
  • ZOOKEEPER-1156 Log truncation truncating log too much – can cause data loss
  • ZOOKEEPER-1046 Creating a new sequential node incorrectly results in a ZNODEEXISTS error
  • ZOOKEEPER-1097 Quota is not correctly rehydrated on snapshot reload
  • ZOOKEEPER-1117 zookeeper 3.3.3 fails to build with gcc >= 4.6.1 on Debian/Ubuntu

In case you are unfamiliar with Zookeeper:

ZooKeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services. All of these kinds of services are used in some form or another by distributed applications. Each time they are implemented there is a lot of work that goes into fixing the bugs and race conditions that are inevitable. Because of the difficulty of implementing these kinds of services, applications initially usually skimp on them ,which make them brittle in the presence of change and difficult to manage. Even when done correctly, different implementations of these services lead to management complexity when the applications are deployed. (from Apache Zookeeper)

IBM InfoSphere BigInsights

Friday, June 3rd, 2011

IBM InfoSphere BigInsights

Two items stand out in the usual laundry list of “easy administration” and “IBM supports open source” list of claims:

The Jaql query language. Jaql, a Query Language for JavaScript Object Notation (JSON), provides the capability to process both structured and non-traditional data. Its SQL-like interface is well suited for quick ramp-up by developers familiar with the SQL language and makes it easier to integrate with relational databases.

….

Integrated installation. BigInsights includes IBM value-added technologies, as well as open source components, such as Hadoop, Lucene, Hive, Pig, Zookeeper, Hbase, and Avro, to name a few.

I guess it must include a “few” things since the 64-bit Linux download is 398 MBs.

Just pointing out its availability. More commentary to follow.