Apache™ Spark™ v1.0

Apache™ Spark™ v1.0

From the post:

The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 170 Open Source projects and initiatives, announced today the availability of Apache Spark v1.0, the super-fast, Open Source large-scale data processing and advanced analytics engine.

Apache Spark has been dubbed a “Hadoop Swiss Army knife” for its remarkable speed and ease of use, allowing developers to quickly write applications in Java, Scala, or Python, using its built-in set of over 80 high-level operators. With Spark, programs can run up to 100x faster than Apache Hadoop MapReduce in memory.

“1.0 is a huge milestone for the fast-growing Spark community. Every contributor and user who’s helped bring Spark to this point should feel proud of this release,” said Matei Zaharia, Vice President of Apache Spark.

Apache Spark is well-suited for machine learning, interactive queries, and stream processing. It is 100% compatible with Hadoop’s Distributed File System (HDFS), HBase, Cassandra, as well as any Hadoop storage system, making existing data immediately usable in Spark. In addition, Spark supports SQL queries, streaming data, and complex analytics such as machine learning and graph algorithms out-of-the-box.

New in v1.0, Apache Spark offers strong API stability guarantees (backward-compatibility throughout the 1.X series), a new Spark SQL component for accessing structured data, as well as richer integration with other Apache projects (Hadoop YARN, Hive, and Mesos).

Spark Homepage.

A bit more technical note of the release from the project:

Spark 1.0.0 is a major release marking the start of the 1.X line. This release brings both a variety of new features and strong API compatibility guarantees throughout the 1.X line. Spark 1.0 adds a new major component, Spark SQL, for loading and manipulating structured data in Spark. It includes major extensions to all of Spark’s existing standard libraries (ML, Streaming, and GraphX) while also enhancing language support in Java and Python. Finally, Spark 1.0 brings operational improvements including full support for the Hadoop/YARN security model and a unified submission process for all supported cluster managers.

You can download Spark 1.0.0 as either a source package (5 MB tgz) or a prebuilt package for Hadoop 1 / CDH3, CDH4, or Hadoop 2 / CDH5 / HDP2 (160 MB tgz). Release signatures and checksums are available at the official Apache download site.

What a nice way to start the weekend!

I first saw this in a tweet by Sean Owen.

Comments are closed.