Apache Flink (formerly Stratosphere) Competitor to Spark

From the Apache Flink 0.6 release page:

What is Flink?

Apache Flink is a general-purpose data processing engine for clusters. It runs on YARN clusters on top of data stored in Hadoop, as well as stand-alone. Flink currently has programming APIs in Java and Scala. Jobs are executed via Flink's own runtime engine. Flink features:

Robust in-memory and out-of-core processing: once read, data stays in memory as much as possible, and is gracefully de-staged to disk in the presence of memory pressure from limited memory or other applications. The runtime is designed to perform very well both in setups with abundant memory and in setups where memory is scarce.

POJO-based APIs: when programming, you do not have to pack your data into key-value pairs or some other framework-specific data model. Rather, you can use arbitrary Java and Scala types to model your data.

Efficient iterative processing: Flink contains explicit "iterate" operators that enable very efficient loops over data sets, e.g., for machine learning and graph applications.

A modular system stack: Flink is not a direct implementation of its APIs but a layered system. All programming APIs are translated to an intermediate program representation that is compiled and optimized via a cost-based optimizer. Lower-level layers of Flink also expose programming APIs for extending the system.

Data pipelining/streaming: Flink's runtime is designed as a pipelined data processing engine rather than a batch processing engine. Operators do not wait for their predecessors to finish in order to start processing data. This results to very efficient handling of large data sets.

The latest version is Apache Flink 0.6.1

See more information at the incubator homepage. Or consult the Apache Flink mailing lists.

The Quickstart is…, wait for it: word count on Hamlet. Nothing against the Bard, but you do know that everyone dies at the end. Yes? Seems like a depressing example.

What you suggest as an example application(s) for this type of software?

I first saw this on Danny Bickson’s blog as Apache flink.

Comments are closed.