Archive for the ‘Data Cubes’ Category

Multi-Dimensional Images / Data Cubes

Tuesday, February 4th, 2014

Accessing Multi-Dimensional Images and Data Cubes In the Virtual Observatory by Bruce Berriman.

From the post:

New instruments and missions are routinely producing multi-dimensional datasets, such as Doppler velocity cubes and time-resolved movies. Observatories such as ALMA and new integral field spectrographs on ground-based telescopes are generating data cubes , and future missions such as LSST and JWST will generate ever larger volumes of them. Thus the VO, via its standards body the International Virtual Observatory Alliance (IVOA), has made it a priority by September 2014 of developing a protocol for discovering data cubes and a reference service for accessing and downloading data cubes.

Bruce includes a poster with a summary of the Simple Image Access Protocol (SIAP, v2).

For more details, consider the SIAP version 2.0 working draft.

The experience with SIAP will be useful when other domains scale up to the current astronomy data requirements.

RDF Data Cube Vocabulary [Last Call ends 08 April 2013]

Tuesday, March 12th, 2013

RDF Data Cube Vocabulary


There are many situations where it would be useful to be able to publish multi-dimensional data, such as statistics, on the web in such a way that it can be linked to related data sets and concepts. The Data Cube vocabulary provides a means to do this using the W3C RDF (Resource Description Framework) standard. The model underpinning the Data Cube vocabulary is compatible with the cube model that underlies SDMX (Statistical Data and Metadata eXchange), an ISO standard for exchanging and sharing statistical data and metadata among organizations. The Data Cube vocabulary is a core foundation which supports extension vocabularies to enable publication of other aspects of statistical data flows or other multi-dimensional data sets.

If you have comments, now would be a good time to finish them up for submission.

I first saw this in a tweet by Sandro Hawke.

The Dwarf OLAP Engine

Tuesday, January 31st, 2012

The Dwarf OLAP Engine

From the webpage:

Dwarf is a patented (US Patent 7,133,876) highly compressed structure for computing, storing, and querying Data Cubes. It is a highly compressed structure with reduction reaching 1:1,000,000 depending on the data distribution. The method is based on finding prefix and suffix redundancies in high dimensional data. Prefix redundancies occur in dense areas of the cube and some existing techniques have utilized. However, we discovered suffix dependency is a lot more higher in sparse areas of multi-dimensional space. The two put together fuse the exponential sizes of high dimensional cubes into a dramatically condensed LOSSLESS store.

With the Dwarf technology, we managed to create the first lossless full PetaCube in a Dwarf store of 2.1GBytes and construction time 80 minutes. The PetaCube is on a 25-dimensional fact table which generates a full cube of a Petabyte in size if stored in binary (all possible 2^^25 un-indexed views/summary tables with two aggregate values). This a 1000-fold bigger than Microsoft’s TeraCube of the future. We also surpassed the fastest OLAP Council APB-1 benchmark density 5 published by Oracle. The Dwarf Cube creation time is 20 minutes and the size of it 3GB compared to Oracle’s 4.5 hours and 30+GB. We further pushed the APB-1 benchmark to its maximum possible density 40 in just 7 hours compute time and about 10GB in size. To the best of our knowledge, no one else has even tried this. This enormous storage reduction comes with NO loss of information and provides a fully indexed cube that includes the original fact table.

The most important aspect of this patented Dwarf technology is that its data fusion (prefix and suffix redundancy elimination) is discovered and eliminated BEFORE the cube is computed and this explains the dramatic reduction in compute time. A complete version of the Dwarf Cube software with full support of hierarchies is available to interested parties under an NDA and a 90-day evaluation agreement.

The Dwarf cube was mentioned in a thread on GPUs and database engines with one commenter lamenting the fact it is under patent.

Patented or not, a quick look at the literature and results for the Dwarf Cube make it an “item of interest” for complex data cubes.

True, not “real time” since you have to build the cube but “real time” is not a universal requirement.

Curious if any of the topic map vendors or research labs have investigated the use of Dwarf cubes as delivery structures for topic maps? (Saying delivery since the cubes would lend themselves to delivery of a computed artifact.)