Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

October 11, 2013

FastBit:..

Filed under: Bitmap Indexes,FastBit,Indexing — Patrick Durusau @ 4:44 pm

FastBit: An Efficient Compressed Bitmap Index Technology

From the webpage:

FastBit is an open-source data processing library following the spirit of NoSQL movement. It offers a set of searching functions supported by compressed bitmap indexes. It treats user data in the column-oriented manner similar to well-known database management systems such as Sybase IQ, MonetDB, and Vertica. It is designed to accelerate user’s data selection tasks without imposing undue requirements. In particular, the user data is NOT required to be under the control of FastBit software, which allows the user to continue to use their existing data analysis tools.

Software

The FastBit software is distributed under the Less GNU Public License (LGPL). The software is available at codeforge.lbl.gov. The most recent release is FastBit ibis1.3.7; it comes as a source tar ball named fastbit-ibis1.3.7.tar.gz. The latest development version is available from http://goo.gl/Ho7ty.

Other items of interest:

FastBit related publications

The most recent entry in this list is 2011. A quick search of the ACM Digital Library (for fastBit) found seventeen (17) articles for 2012 – 2013.

FastBit Users Guide

From the users guide:

This package implements a number of different bitmap indexes compressed with Word-Aligned Hybrid code. These indexes differ in their bitmap encoding methods and binning options. The basic bitmap index compressed with WAH has been shown to answer one-dimensional queries in time that is proportional to the number of hits in theory. In a number of performance measurements, WAH compressed indexes were found to be much more efficient than other indexes [CIKM 2001] [SSDBM 2002] [DOLAP 2002]. One of the crucial step in achieving these efficiency is to be able to perform bitwise OR operations on a large compressed bitmaps efficiently without decompression [VLDB 2004]. Numerous other bitmap encodings and binning strategies are implemented in this software package, please refer to indexSpec.html for descriptions on how to access these indexes and refer to our publications for extensive studies on these methods. FastBit was primarily developed to test these techniques for improving compressed bitmap indexes. Even though, it has grown to include a small number other useful data analysis functions, its primary strength is still in having a diversity of efficient compressed bitmap indexes.

Just in case you want to follow up on the use of fastBit in the RaptorDB.

January 30, 2012

Using Bitmap Indexes in Query Processing

Filed under: Bitmap Indexes,EWAB,HWAB,Query Language — Patrick Durusau @ 8:00 pm

Why are column oriented databases so much faster than row oriented databases?

Be sure to read all the comments. Some of the techniques described are covered by patents (according to the comments) but there are open source implementations of alternatives. There is also a good discussion of the trade-offs in using this technique.

Search terms: Hybrid Word Aligned Bitmaps, HWAB, EWAB, FastBit.

Follow the links in the post and comments for more resources.

Questions:

From a topic map perspective, how would you structure a set of relational tables to represent the information items defined by the Topic Map Data Model? (Yes, it has been done before but no peeking! Your result will likely be very similar but I am interested in how you would structure the data. (If you want to think ahead, same question for the various NoSQL options.)

For the relational database, how would you structure a chain of selects to choose all the information items that should merge for any particular item. In other words, start off with the values of an item that should merge and construct a select that gathers up the other items with which it should merge.

Enumerate the operations you would need to perform post-select to present a “merged” information item to the final user.

Observation: Isn’t indexing the first step towards merging? That is we have to gather up all the relevant representatives of a subject before we can consider the mechanics of merging?

First seen at myNoSQL.

Powered by WordPress