Archive for the ‘BerkeleyDB’ Category

Berkeley DB at Yammer: Application Specific NoSQL Data Stores for Everyone

Saturday, May 26th, 2012

Berkeley DB at Yammer: Application Specific NoSQL Data Stores for Everyone

Alex Popescu calls attention to Ryan Kennedy of Yammer presenting on transitioning from PostgreSQL to Berkeley DB.

Is that the right direction?

Watch the presentation and see what you think.

Using BerkeleyDB to Create a Large N-gram Table

Friday, May 18th, 2012

Using BerkeleyDB to Create a Large N-gram Table by Richard Marsden.

From the post:

Previously, I showed you how to create N-Gram frequency tables from large text datasets. Unfortunately, when used on very large datasets such as the English language Wikipedia and Gutenberg corpora, memory limitations limited these scripts to unigrams. Here, I show you how to use the BerkeleyDB database to create N-gram tables of these large datasets.

Large datasets such as the Wikipedia and Gutenberg English language corpora cannot be used to create N-gram frequency tables using the previous script due to the script’s large in-memory requirements. The solution is to create the frequency table as a disk-based dataset. For this, the BerkeleyDB database in key-value mode is ideal. This is an open source “NoSQL” library which supports a disk based database and in-memory caching. BerkeleyDB can be downloaded from the Oracle website, and also ships with a number of Linux distributions, including Ubuntu. To use BerkeleyDB from Python, you will need the bsddb3 package. This is included with Python 2.* but is an additional download for Python 3 installations.

Richard promises to make the resulting data sets available as an Azure service. Sample code, etc, will be posted to his blog.

Another Wikipedia based analysis.

Using Oracle Berkeley DB as a NoSQL Data Store

Tuesday, October 4th, 2011

Using Oracle Berkeley DB as a NoSQL Data Store

I saw this on Twitter but waited until I could confirm with documentation I knew to exist on an Oracle website. 😉

I take this as a sign that storage, query and retrieval technology may be about to undergo a fundamental change. Unlike “big data,” which just that, data that requires a lot of storage, how we store, query and retrieve data is much more fundamental.

The BerkeleyDB as storage engine may be a clue as to future changes. What if there was even a common substrate for database engines, SQL, NoSQ, Graph, etc.? Onto which was imposed whatever higher level operations you wished to perform? Done with a copy-on-write mechanism so every view is persisted across the data set.

A common storage substrate would be a great boon to everyone. Think of three dimensional or even crystalline storage which isn’t that far away. Now would be a good time for the major vendors to start working towards a common substrate for database engines.