Archive for the ‘Log Analysis’ Category

How to set up Semantic Logging…

Friday, February 15th, 2013

How to set up Semantic Logging: part one with Logstash, Kibana, ElasticSearch and Puppet, by Henrik Feldt.

While we are on the topic of semantic logging:

Logging today is mostly done too unstructured; each application developer has his own syntax for the logs, optimized for his personal requirements and when it is time to deploy, ops consider themselves lucky if there is even some logging in the application, and even luckier if that logging can be used to find problems as they occur by being able to adjust verbosity where needed.

I’ve come to the point where I want a really awesome piece of logging from the get-go – something I can pick up and install in a couple of minutes when I come to a new customer site without proper operations support.

I want to be able to be able to search, drill down into, filter out patterns and have good tooling that allow me to let logging be an obvious support as the application is brought through its life cycle, from development to production. And I don’t want to write my own log parsers, thank you very much!

That’s where semantic logging comes in – my applications should be broadcasting log data in a manner that allow code to route, filter and index it. That’s why I’ve spent a lot of time researching how logging is done in a bloody good manner – this post and upcoming ones will teach you how to make your logs talk!

It’s worth noting that you can read this post no matter your programming language. In fact, the tooling that I’m about to discuss will span multiple operating systems; Linux, Windows, and multiple programming languages: Erlang, Java, Puppet, Ruby, PHP, JavaScript and C#. I will demo logging from C#/Win initially and continue with Python, Haskell and Scala in upcoming posts.

I didn’t see any posts following this one. But it is complete enough to get you started on semantic logging.

Embracing Semantic Logging

Friday, February 15th, 2013

Embracing Semantic Logging by Grigori Melnik.

From the post:

In the world of software engineering, every system needs to log. Logging helps to diagnose and troubleshoot problems with your system both in development and in production. This requires proper, well-designed instrumentation. All too often, however, developers instrument their code to do logging without having a clear strategy and without thinking through the ways the logs are going to be consumed, parsed, and interpreted. Valuable contextual information about events frequently gets lost, or is buried inside the log messages. Furthermore, in some cases logging is done simply for the sake of logging, more like a checkmark on the list. This situation is analogous to people fallaciously believing their backup system is properly implemented by enabling the backup but never, actually, trying to restore from those backups.

This lack of a thought-through logging strategy results in systems producing huge amounts of log data which is less useful or entirely useless for problem resolution.

Many logging frameworks exist today (including our own Logging Application Block and log4net). In a nutshell, they provide high-level APIs to help with formatting log messages, grouping (by means of categories or hierarchies) and writing them to various destinations. They provide you with an entry point – some sort of a logger object through which you call log writing methods (conceptually, not very different from Console.WriteLine(message)). While supporting dynamic reconfiguration of certain knobs, they require the developer to decide upfront on the template of the logging message itself. Even when this can be changed, the message is usually intertwined with the application code, including metadata about the entry such as the severity and entry id.

As ever in all discussions, even those of semantics, there is some impedance:

Imagine another world, where the events get logged and their semantic meaning is preserved. You don’t lose any fidelity in your data. Welcome to the world of semantic logging. Note, some people refer to semantic logging as “structured logging”, “strongly-typed logging” or “schematized logging”.

Whatever you want to call it:

The technology to enable semantic logging in Windows has been around for a while (since Windows 2000). It’s called ETW – Event Tracing for Windows. It is a fast, scalable logging mechanism built into the Windows operating system itself. As Vance Morrison explains, “it is powerful because of three reasons:

  1. The operating system comes pre-wired with a bunch of useful events
  2. It can capture stack traces along with the event, which is INCREDIBLY USEFUL.
  3. It is extensible, which means that you can add your own information that is relevant to your code.

EW has been improved in .NET Framework 4.5 but I will leave you to Grigori’s post to ferret out those details.

Semantic logging is important for all the reasons mentioned in Grigori’s post and because captured semantics provide grist for semantic mapping mills.

Fluentd: the missing log collector

Saturday, December 10th, 2011

Fluentd: the missing log collector

From the post:

The Problems

The fundamental problem with logs is that they are usually stored in files although they are best represented as streams (by Adam Wiggins, CTO at Heroku). Traditionally, they have been dumped into text-based files and collected by rsync in hourly or daily fashion. With today’s web/mobile applications, this creates two problems.

Problem 1: Need Ad-Hoc Parsing

The text-based logs have their own format, and the analytics engineer needs to write a dedicated parser for each format. However, You are a DATA SCIENTIST, NOT A PARSER GENERATOR, right? :-)

Problem 2: Lacks Freshness

The logs lag. The realtime analysis of user behavior makes feature iterations a lot faster. A nimbler A/B testing will help you differentiate your service from competitors.

This is where Fluentd comes in. We believe Fluentd solves all issues of scalable log collection by getting rid of files, and turns logs into true semi-structured data streams.

If you are interested in log file processing, take a look at Fluentd and compare it to the competition.

As far as logs as streams, I think the “file view” of most data, logs or not, isn’t helpful. What does it matter to me if the graphs for a document are being generated in real time by a server and updated in my document? Or that a select bibliography is being updated so that readers get the late breaking research in a fast developing field?

The “fixed text” of a document is a view based upon the production means for documents. When those production means change, so should our view of documents.

Social Data and Log Analysis Using MongoDB

Tuesday, March 1st, 2011

Social Data and Log Analysis Using MongoDB

Interesting use of MongoDB.

Work through the slide deck and consider the following questions along the way:

  1. How would your analysis of the logs (the process of analysis) be different if you were using topic maps?
  2. How would your results from #1 be different?
  3. Choose a set of logs and test your answers to #1 and #2.

(Credit will be equally rewarded whether #3 confirms or contradicts your analysis in #1 and #2. The purpose of the exercise is to develop a “fee” for fruitful areas of investigation.)