Can Spark Streaming survive Chaos Monkey? by Bharat Venkat, Prasanna Padmanabhan, Antony Arokiasamy, Raju Uppalap.
From the post:
Netflix is a data-driven organization that places emphasis on the quality of data collected and processed. In our previous blog post, we highlighted our use cases for real-time stream processing in the context of online recommendations and data monitoring. With Spark Streaming as our choice of stream processor, we set out to evaluate and share the resiliency story for Spark Streaming in the AWS cloud environment. A Chaos Monkey based approach, which randomly terminated instances or processes, was employed to simulate failures.
Spark on Amazon Web Services (AWS) is relevant to us as Netflix delivers its service primarily out of the AWS cloud. Stream processing systems need to be operational 24/7 and be tolerant to failures. Instances on AWS are ephemeral, which makes it imperative to ensure Spark’s resiliency.
…
If Spark was commercial product this is where you would see in bold, not a vendor report, from a customer.
You need to see the post for the details but so you know what to expect:
Component Type Behaviour on Component Failure Resilient Driver Process Client Mode: The entire application is killed Cluster Mode with supervise: The Driver is restarted on a different Worker node Master Process Single Master: The entire application is killed Multi Master: A STANDBY master is elected ACTIVE Worker Process Process All child processes (executor or driver) are also terminated and a new worker process is launched Executor Process A new executor is launched by the Worker process Receiver Thread(s) Same as Executor as they are long running tasks inside the Executor Worker Node Node Worker, Executor and Driver processes run on Worker nodes and the behavior is same as killing them individually
I can think of few things more annoying that software that works, sometimes. If you want users to rely upon you, then your service will have to be reliable.
A performance post by Netflix is rumored to be in the offing!
Enjoy!