Fingerprinting Data/Relationships/Subjects?

Virtual image library fingerprints data

From the post:

It’s inevitable. Servers crash. Applications misbehave. Even if you troubleshoot and figure out the problem, the process of problem diagnosis will likely involve numerous investigative actions to examine the configurations of one or more systems—all of which would be difficult to describe in any meaningful way. And every time you encounter a similar problem, you could end up repeating the same complex process of problem diagnosis and remediation.

As someone who deals with just such scenarios in my role as manager of the Scalable Datacenter Analytics Department at IBM Research, my team and I realized we needed a way to “fingerprint” known bad configuration states of systems. This way, we could reduce the problem diagnosis time by relying on fingerprint recognition techniques to narrow the search space.

Project Origami was thus born from this desire to develop an easier-to-use problem diagnosis system to troubleshoot misconfiguration problems in the data center. Origami, today a collaboration between IBM Open Collaborative Research, Carnegie Mellon University, the University of Toronto, and the University of California at San Diego, is a collection of tools for fingerprinting, discovering, and mining configuration information on a data center-wide scale. It uses public domain virtual image library, Olive, an idea created under this Open Collaborative Research a few years ago.

It even provides an ad-hoc interface to the users, as there is no rule language for them to learn. Instead, users give Origami an example of what they deem to be a bad configuration, which Origami fingerprints and adds to its knowledge base. Origami then continuously crawls systems in the data center, monitoring the environment for configuration patterns that match known bad fingerprints in its knowledge base. A match triggers deeper analytics that then examine those systems for problematic configuration settings.

Identifications of data, relationships and subjects could be expressed as “fingerprints.”

Searching by “fingerprints” would be far easier than any query language.

Reasoning that searching challenges users to bridge the semantic gap between them and content authors.

Query languages add another semantic gap, between users and query language designers.

Why useful results are obtained at all using query languages remains unexplained.

Comments are closed.