Paper: A Study of Practical Deduplication
From the post:
With BigData comes BigStorage costs. One way to store less is simply not to store the same data twice. That’s the radically simple and powerful notion behind data deduplication. If you are one of those who got a good laugh out of the idea of eliminating SQL queries as a rather obvious scalability strategy, you’ll love this one, but it is a powerful feature and one I don’t hear talked about outside the enterprise. A parallel idea in programming is the once-and-only-once principle of never duplicating code.
Someone asked the other day about how to make topic maps profitable.
Well, selling a solution to issues like duplication of data would be one of them.
You do know that the kernel of the idea for topic maps arose out of a desire to avoid paying 2X, 3X, 4X, or more for the same documentation on military equipment. Yes? Didn’t fly ultimately because of the markup that contractors get on documentation, which then funds their hiring military retirees. That doesn’t mean the original idea was a bad one.
Now, applying a topic map to military documentation systems and demonstrating the duplication of content, perhaps using one of Lars Marius Garshol’s similarity measures, that sounds like a rocking topic map application. Particularly in budget cutting times.