From the post:
The fundamental problem with logs is that they are usually stored in files although they are best represented as streams (by Adam Wiggins, CTO at Heroku). Traditionally, they have been dumped into text-based files and collected by rsync in hourly or daily fashion. With today’s web/mobile applications, this creates two problems.
Problem 1: Need Ad-Hoc Parsing
The text-based logs have their own format, and the analytics engineer needs to write a dedicated parser for each format. However, You are a DATA SCIENTIST, NOT A PARSER GENERATOR, right? 🙂
Problem 2: Lacks Freshness
The logs lag. The realtime analysis of user behavior makes feature iterations a lot faster. A nimbler A/B testing will help you differentiate your service from competitors.
This is where Fluentd comes in. We believe Fluentd solves all issues of scalable log collection by getting rid of files, and turns logs into true semi-structured data streams.
If you are interested in log file processing, take a look at Fluentd and compare it to the competition.
As far as logs as streams, I think the “file view” of most data, logs or not, isn’t helpful. What does it matter to me if the graphs for a document are being generated in real time by a server and updated in my document? Or that a select bibliography is being updated so that readers get the late breaking research in a fast developing field?
The “fixed text” of a document is a view based upon the production means for documents. When those production means change, so should our view of documents.