From the website:
logstash is a tool for managing events and logs. You can use it to collect logs, parse them, and store them for later use (like, for searching). Speaking of searching, logstash comes with a web interface for searching and drilling into all of your logs.
I mention this for two reasons:
First, obviously as a tool for mining/searching logs. Deciding what subjects in a log will later appear in a topic map starts with discovery of those subjects.
Secondly, perhaps less obviously, thinking that adding subject identity to events discovered in logs could enable mapping across logs, say for example that were mining TCP/IP packet traffic.
Can’t imagine why anyone would be sitting on or near a big switch doing that, ;-), but just to cover all the edge cases.
If you filtered out all the known porn site and search engine traffic, both of which are large but knowable lists, the amount of stuff you have to process starts to look pretty manageable.
Does anyone know the ratio of porn/search to other traffic into the Pentagon? Or Congress? Just curious if there is a useful baseline.