Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

June 7, 2015

Signatures, patterns and trends: Timeseries data mining at Etsy

Filed under: Cybersecurity,Pattern Recognition,Streams,Time Series — Patrick Durusau @ 8:56 pm

From the description:

Etsy loves metrics. Everything that happens in our data centres gets recorded, graphed and stored. But with over a million metrics flowing in constantly, it’s hard for any team to keep on top of all that information. Graphing everything doesn’t scale, and traditional alerting methods based on thresholds become very prone to false positives.

That’s why we started Kale, an open-source software suite for pattern mining and anomaly detection in operational data streams. These are big topics with decades of research, but many of the methods in the literature are ineffective on terabytes of noisy data with unusual statistical characteristics, and techniques that require extensive manual analysis are unsuitable when your ops teams have service levels to maintain.

In this talk I’ll briefly cover the main challenges that traditional statistical methods face in this environment, and introduce some pragmatic alternatives that scale well and are easy to implement (and automate) on Elasticsearch and similar platforms. I’ll talk about the stumbling blocks we encountered with the first release of Kale, and the resulting architectural changes coming in version 2.0. And I’ll go into a little technical detail on the algorithms we use for fingerprinting and searching metrics, and detecting different kinds of unusual activity. These techniques have potential applications in clustering, outlier detection, similarity search and supervised learning, and they are not limited to the data centre but can be applied to any high-volume timeseries data.

Blog post: https://codeascraft.com/2013/06/11/introducing-kale/

Signature, patterns and trends? Sounds relevant to monitoring network patterns. Yes?

Good focus on anomaly detection, pointing out that many explanations are overly simplistic.

Use case is one (1) million incoming metrics.

Looking forward to seeing this released as open source!

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress