Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

January 31, 2012

The Dwarf OLAP Engine

Filed under: Data Cubes,Dwarf Cubes — Patrick Durusau @ 4:31 pm

The Dwarf OLAP Engine

From the webpage:

Dwarf is a patented (US Patent 7,133,876) highly compressed structure for computing, storing, and querying Data Cubes. It is a highly compressed structure with reduction reaching 1:1,000,000 depending on the data distribution. The method is based on finding prefix and suffix redundancies in high dimensional data. Prefix redundancies occur in dense areas of the cube and some existing techniques have utilized. However, we discovered suffix dependency is a lot more higher in sparse areas of multi-dimensional space. The two put together fuse the exponential sizes of high dimensional cubes into a dramatically condensed LOSSLESS store.

With the Dwarf technology, we managed to create the first lossless full PetaCube in a Dwarf store of 2.1GBytes and construction time 80 minutes. The PetaCube is on a 25-dimensional fact table which generates a full cube of a Petabyte in size if stored in binary (all possible 2^^25 un-indexed views/summary tables with two aggregate values). This a 1000-fold bigger than Microsoft’s TeraCube of the future. We also surpassed the fastest OLAP Council APB-1 benchmark density 5 published by Oracle. The Dwarf Cube creation time is 20 minutes and the size of it 3GB compared to Oracle’s 4.5 hours and 30+GB. We further pushed the APB-1 benchmark to its maximum possible density 40 in just 7 hours compute time and about 10GB in size. To the best of our knowledge, no one else has even tried this. This enormous storage reduction comes with NO loss of information and provides a fully indexed cube that includes the original fact table.

The most important aspect of this patented Dwarf technology is that its data fusion (prefix and suffix redundancy elimination) is discovered and eliminated BEFORE the cube is computed and this explains the dramatic reduction in compute time. A complete version of the Dwarf Cube software with full support of hierarchies is available to interested parties under an NDA and a 90-day evaluation agreement.

The Dwarf cube was mentioned in a thread on GPUs and database engines with one commenter lamenting the fact it is under patent.

Patented or not, a quick look at the literature and results for the Dwarf Cube make it an “item of interest” for complex data cubes.

True, not “real time” since you have to build the cube but “real time” is not a universal requirement.

Curious if any of the topic map vendors or research labs have investigated the use of Dwarf cubes as delivery structures for topic maps? (Saying delivery since the cubes would lend themselves to delivery of a computed artifact.)

Powered by WordPress