Archive for the ‘Log Analysis’ Category

Meet Apache Spot… [Additional Malware Requirement: Appear Benign]

Wednesday, September 28th, 2016

Meet Apache Spot, a new open source project for cybersecurity by Katherine Noyes.

From the post:

Hard on the heels of the discovery of the largest known data breach in history, Cloudera and Intel on Wednesday announced that they’ve donated a new open source project to the Apache Software Foundation with a focus on using big data analytics and machine learning for cybersecurity.

Originally created by Intel and launched as the Open Network Insight (ONI) project in February, the effort is now called Apache Spot and has been accepted into the ASF Incubator.

“The idea is, let’s create a common data model that any application developer can take advantage of to bring new analytic capabilities to bear on cybersecurity problems,” Mike Olson, Cloudera co-founder and chief strategy officer, told an audience at the Strata+Hadoop World show in New York. “This is a big deal, and could have a huge impact around the world.”

Essentially, it uses machine learning as a filter to separate bad traffic from benign and to characterize network traffic behavior. It also uses a process including context enrichment, noise filtering, whitelisting and heuristics to produce a shortlist of most likely security threats.

Given the long tail for patch application, Prioritizing Patch Management Critical to Security, which reads in part:

Patch management – two words that are vital to cybersecurity, but that rarely generate enough attention.

That lack of attention can cost. Recent stats from the Verizon Data Breach report showed that many of the most exploited vulnerabilities in 2014 were nearly a decade old, and some were even more ancient than that. Additional numbers from the NTT Group 2015 Global Threat Intelligence Report revealed that 76 percent of vulnerabilities they observed on enterprise networks in 2014 were two years old or more.

Apache Spot is not an immediate threat to hacking success, but that’s no reason to delay sharpening your malware skills.

Beyond making malware seem benign, have you considered making normal application traffic seem rogue?

When security becomes “too burdensome,” uninformed decision makers may do more damage than hackers.

I know machine learning has improved but I find the use case:


at the very best, implausible. 😉

Thoughts on a test environment to mimic target networks?


Using Solr to Search and Analyze Logs

Tuesday, November 12th, 2013

Using Solr to Search and Analyze Logs by Radu Gheorghe.

From the description:

Since we’ve added Solr output for Logstash, indexing logs via Logstash has become a possibility. But what if you are not using (only) Logstash? Are there other ways you can index logs in Solr? Oh yeah, there are! The following slides are from Lucene Revolution conference that just took place in Dublin where we talked about indexing and searching logs with Solr.

Slides but a very good set of slides.

Radu’s post reminds me I over looked logs in the Hadoop eco-system when describing semantic diversity (Hadoop Ecosystem Configuration Woes?).

Or for that matter, how do you link up the logs with particular configuration or job settings?

Emails to the support desk and sticky notes don’t seem equal to the occasion.

Transforming Log Events into Information

Saturday, July 13th, 2013

Transforming Log Events into Information by Matthias Nehlsen.

From the post:

Last week I was dealing with an odd behavior of the chat application demo I was running for this article. The issue was timing-related and there were no actual exceptions that would have helped in identifying the problem. How are you going to even notice spikes and pauses in potentially thousands of lines in a logfile? I was upset, mostly with myself for not finding the issue earlier, and I promised myself to find a better tool. I needed a way to transform the raw logging data into useful information so I could first understand and then tackle the problem. In this article I will show what I have put together over the weekend. Part I describes the general approach and applies to any application out there, no matter what language or framework you are using. Part II describes one possible implementation of this approach using Play Framework.

Starting point for transforming selected log events into subjects represented by topics?

Not sure I would want to generate IRIs to identify the events as subjects, particularly since they already have identifiers in the log.

A broader processing model for the TAO should allow for the use of user defined identifiers.

What is the Latin for: User Beware? 😉

How to set up Semantic Logging…

Friday, February 15th, 2013

How to set up Semantic Logging: part one with Logstash, Kibana, ElasticSearch and Puppet, by Henrik Feldt.

While we are on the topic of semantic logging:

Logging today is mostly done too unstructured; each application developer has his own syntax for the logs, optimized for his personal requirements and when it is time to deploy, ops consider themselves lucky if there is even some logging in the application, and even luckier if that logging can be used to find problems as they occur by being able to adjust verbosity where needed.

I’ve come to the point where I want a really awesome piece of logging from the get-go – something I can pick up and install in a couple of minutes when I come to a new customer site without proper operations support.

I want to be able to be able to search, drill down into, filter out patterns and have good tooling that allow me to let logging be an obvious support as the application is brought through its life cycle, from development to production. And I don’t want to write my own log parsers, thank you very much!

That’s where semantic logging comes in – my applications should be broadcasting log data in a manner that allow code to route, filter and index it. That’s why I’ve spent a lot of time researching how logging is done in a bloody good manner – this post and upcoming ones will teach you how to make your logs talk!

It’s worth noting that you can read this post no matter your programming language. In fact, the tooling that I’m about to discuss will span multiple operating systems; Linux, Windows, and multiple programming languages: Erlang, Java, Puppet, Ruby, PHP, JavaScript and C#. I will demo logging from C#/Win initially and continue with Python, Haskell and Scala in upcoming posts.

I didn’t see any posts following this one. But it is complete enough to get you started on semantic logging.

Embracing Semantic Logging

Friday, February 15th, 2013

Embracing Semantic Logging by Grigori Melnik.

From the post:

In the world of software engineering, every system needs to log. Logging helps to diagnose and troubleshoot problems with your system both in development and in production. This requires proper, well-designed instrumentation. All too often, however, developers instrument their code to do logging without having a clear strategy and without thinking through the ways the logs are going to be consumed, parsed, and interpreted. Valuable contextual information about events frequently gets lost, or is buried inside the log messages. Furthermore, in some cases logging is done simply for the sake of logging, more like a checkmark on the list. This situation is analogous to people fallaciously believing their backup system is properly implemented by enabling the backup but never, actually, trying to restore from those backups.

This lack of a thought-through logging strategy results in systems producing huge amounts of log data which is less useful or entirely useless for problem resolution.

Many logging frameworks exist today (including our own Logging Application Block and log4net). In a nutshell, they provide high-level APIs to help with formatting log messages, grouping (by means of categories or hierarchies) and writing them to various destinations. They provide you with an entry point – some sort of a logger object through which you call log writing methods (conceptually, not very different from Console.WriteLine(message)). While supporting dynamic reconfiguration of certain knobs, they require the developer to decide upfront on the template of the logging message itself. Even when this can be changed, the message is usually intertwined with the application code, including metadata about the entry such as the severity and entry id.

As ever in all discussions, even those of semantics, there is some impedance:

Imagine another world, where the events get logged and their semantic meaning is preserved. You don’t lose any fidelity in your data. Welcome to the world of semantic logging. Note, some people refer to semantic logging as “structured logging”, “strongly-typed logging” or “schematized logging”.

Whatever you want to call it:

The technology to enable semantic logging in Windows has been around for a while (since Windows 2000). It’s called ETW – Event Tracing for Windows. It is a fast, scalable logging mechanism built into the Windows operating system itself. As Vance Morrison explains, “it is powerful because of three reasons:

  1. The operating system comes pre-wired with a bunch of useful events
  2. It can capture stack traces along with the event, which is INCREDIBLY USEFUL.
  3. It is extensible, which means that you can add your own information that is relevant to your code.

EW has been improved in .NET Framework 4.5 but I will leave you to Grigori’s post to ferret out those details.

Semantic logging is important for all the reasons mentioned in Grigori’s post and because captured semantics provide grist for semantic mapping mills.

Fluentd: the missing log collector

Saturday, December 10th, 2011

Fluentd: the missing log collector

From the post:

The Problems

The fundamental problem with logs is that they are usually stored in files although they are best represented as streams (by Adam Wiggins, CTO at Heroku). Traditionally, they have been dumped into text-based files and collected by rsync in hourly or daily fashion. With today’s web/mobile applications, this creates two problems.

Problem 1: Need Ad-Hoc Parsing

The text-based logs have their own format, and the analytics engineer needs to write a dedicated parser for each format. However, You are a DATA SCIENTIST, NOT A PARSER GENERATOR, right? 🙂

Problem 2: Lacks Freshness

The logs lag. The realtime analysis of user behavior makes feature iterations a lot faster. A nimbler A/B testing will help you differentiate your service from competitors.

This is where Fluentd comes in. We believe Fluentd solves all issues of scalable log collection by getting rid of files, and turns logs into true semi-structured data streams.

If you are interested in log file processing, take a look at Fluentd and compare it to the competition.

As far as logs as streams, I think the “file view” of most data, logs or not, isn’t helpful. What does it matter to me if the graphs for a document are being generated in real time by a server and updated in my document? Or that a select bibliography is being updated so that readers get the late breaking research in a fast developing field?

The “fixed text” of a document is a view based upon the production means for documents. When those production means change, so should our view of documents.

Social Data and Log Analysis Using MongoDB

Tuesday, March 1st, 2011

Social Data and Log Analysis Using MongoDB

Interesting use of MongoDB.

Work through the slide deck and consider the following questions along the way:

  1. How would your analysis of the logs (the process of analysis) be different if you were using topic maps?
  2. How would your results from #1 be different?
  3. Choose a set of logs and test your answers to #1 and #2.

(Credit will be equally rewarded whether #3 confirms or contradicts your analysis in #1 and #2. The purpose of the exercise is to develop a “fee” for fruitful areas of investigation.)