Archive for the ‘Sqoop’ Category

What’s New in Apache Sqoop 1.4.2

Thursday, November 8th, 2012

What’s New in Apache Sqoop 1.4.2 by by Jarek Jarcec Cecho.

Jarek highlights the key features and fixes of this release of Apache Sqoop (its first as a top level project).

Those include:

  • Hadoop 2.0.0 Support
  • Compatibility with Old Connectors
  • Incremental Imports of Free Form Queries
  • Implicit and Explicit Connector Pickup Improvements
  • Exporting Only a Subset of Columns
  • Verbose Logging
  • Hive Imports

From the Apache Sqoop homepage:

Apache Sqoop(TM) is a tool designed for efficiently transferring bulk data between Apache Hadoop and structured datastores such as relational databases.

Hortonworks Data Platform v1.0 Download Now Available

Thursday, June 21st, 2012

Hortonworks Data Platform v1.0 Download Now Available

From the post:

If you haven’t yet noticed, we have made Hortonworks Data Platform v1.0 available for download from our website. Previously, Hortonworks Data Platform was only available for evaluation for members of the Technology Preview Program or via our Virtual Sandbox (hosted on Amazon Web Services). Moving forward and effective immediately, Hortonworks Data Platform is available to the general public.

Hortonworks Data Platform is a 100% open source data management platform, built on Apache Hadoop. As we have stated on many occasions, we are absolutely committed to the Apache Hadoop community and the Apache development process. As such, all code developed by Hortonworks has been contributed back to the respective Apache projects.

Version 1.0 of Hortonworks Data Platform includes Apache Hadoop-1.0.3, the latest stable line of Hadoop as defined by the Apache Hadoop community. In addition to the core Hadoop components (including MapReduce and HDFS), we have included the latest stable releases of essential projects including HBase 0.92.1, Hive 0.9.0, Pig 0.9.2, Sqoop 1.4.1, Oozie 3.1.3 and Zookeeper 3.3.4. All of the components have been tested and certified to work together. We have also added tools that simplify the installation and configuration steps in order to improve the experience of getting started with Apache Hadoop.

I’m a member of the general public! And you probably are too! ;-)

See the rest of the post for more goodies that are included with this release.

Sqoop Graduation Meetup

Thursday, April 12th, 2012

Sqoop Graduation Meetup by Kathleen Ting.

From the post:

Cloudera hosted the Apache Sqoop Meetup last week at Cloudera HQ in Palo Alto. About 20 of the Meetup attendees had not used Sqoop before, but were interested enough to participate in the Meetup on April 4th. We believe this healthy interest in Sqoop will contribute to its wide adoption.

Not only was this Sqoop’s second Meetup but also a celebration for Sqoop’s graduation from the Incubator, cementing its status as a Top-Level Project in Apache Software Foundation. Sqoop’s come a long way since its beginnings three years ago as a contrib module for Apache Hadoop submitted by Aaron Kimball. As a result, it was fitting that Aaron gave the first talk of the night by discussing its history: “Sqoop: The Early Days.” From Aaron, we learned that Sqoop’s original name was “SQLImport” and that it was conceived out of his frustration from the inability to easily query both unstructured and structured data at the same time. (Emphasis added.)

I don’t think the extra 20 people were present because of Sqoop.

Did you see the picture of the cake?

My vote goes for the cake as explanation. Yours? ;-)

Congratulations to Sqoop, Sqoop team and community!

Let’s make sure on its first birthday a bigger cake is required!

Apache Bigtop 0.3.0 (incubating) has been released

Wednesday, April 4th, 2012

Apache Bigtop 0.3.0 (incubating) has been released by Roman Shaposhnik.

From the post:

Apache Bigtop 0.3.0 (incubating) is now available. This is the first fully integrated, community-driven, 100% Apache Big Data management distribution based on Apache Hadoop 1.0. In addition to a major change in the Hadoop version, all of the Hadoop ecosystem components have been upgraded to the latest stable versions and thoroughly tested:

  • Apache Hadoop 1.0.1
  • Apache Zookeeper 3.4.3
  • Apache HBase 0.92.0
  • Apache Hive 0.8.1
  • Apache Pig 0.9.2
  • Apache Mahout 0.6.1
  • Apache Oozie 3.1.3
  • Apache Sqoop 1.4.1
  • Apache Flume 1.0.0
  • Apache Whirr 0.7.0

Thoughts on what is missing from this ecosystem?

What if you moved from the company where you wrote the scripts? And they needed new scripts?

Re-write? On what basis?

Is your “big data” big enough to need “big documentation?”

Apache Sqoop Graduates from Incubator

Tuesday, April 3rd, 2012

Apache Sqoop Graduates from Incubator by Arvind Prabhakar.

From the post:

Apache Sqoop is a tool designed for efficiently transferring bulk data between Apache Hadoop and structured datastores such as relational databases. You can use Sqoop to import data from external structured datastores into Hadoop Distributed File System or related systems like Hive and HBase. Conversely, Sqoop can be used to extract data from Hadoop and export it to external structured datastores such as relational databases and enterprise data warehouses.

In its monthly meeting in March of 2012, the board of Apache Software Foundation (ASF) resolved to grant a Top-Level Project status to Apache Sqoop, thus graduating it from the Incubator. This is a significant milestone in the life of Sqoop, which has come a long way since its inception almost three years ago.

For moving data in and out of Hadoop, Sqoop is your friend. Drop by and say hello.

What’s New in Apache Sqoop 1.4.0-incubating

Tuesday, January 3rd, 2012

What’s New in Apache Sqoop 1.4.0-incubating

New features and improvements in the first incubating release:

If you are interested in learning more about the changes, a complete list for Sqoop 1.4.0-incubating can be found here.  You are also encouraged to give this new release a try.  Any help and feedback is more than welcome. For more information on how to report problems and to get involved, visit the Sqoop project website at http://incubator.apache.org/sqoop/.

BTW, “Apache Sqoop is a tool designed for efficiently transferring bulk data between Apache Hadoop and structured datastores such as relational databases.” (From Apache Sqoop (incubating))