Archive for the ‘Sharding’ Category


Tuesday, May 29th, 2012

I stumbled upon CUBRID via its Important Facts to Know about CUBRID page, where the first entry reads:

Naming Conventions:

The name of this DBMS is CUBRID, written in capital letters, and not Cubrid. We would appreciate much if you followed this naming conventions. It should be fairly simple to remember, itsn’t it!?

Got my attention!

Not for a lack of projects with “attitude” on the Net but a project with “attitude” that expressed it cleverly. Not just offensively.

Features of CUBRID:

Here are the key features that make CUBRID the most optimized open source database management system:

First time I have seen CUBRID.

Does promise a release supporting sharding in June 2012.

The documentation posits extensions to the relational data model:

Extending the Relational Data Model


For the relational data model, it is not allowed that a single column has multiple values. In CUBRID, however, you can create a column with several values. For this purpose, collection data types are provided in CUBRID. The collection data type is mainly divided into SET, MULTISET and LIST; the types are distinguished by duplicated availability and order.

  • SET : A collection type that does not allow the duplication of elements. Elements are stored without duplication after being sorted regardless of their order of entry.
  • MULTISET : A collection type that allows the duplication of elements. The order of entry is not considered.
  • LIST : A collection type that allows the duplication of elements. Unlike with SET and MULTISET, the order of entry is maintained.


Inheritance is a concept to reuse columns and methods of a parent table in those of child tables. CUBRID supports reusability through inheritance. By using inheritance provided by CUBRID, you can create a parent table with some common columns and then create child tables inherited from the parent table with some unique columns added. In this way, you can create a database model which can minimize the number of columns.


In a relational database, the reference relationship between tables is defined as a foreign key. If the foreign key consists of multiple columns or the size of the key is significantly large, the performance of join operations between tables will be degraded. However, CUBRID allows the direct use of the physical address (OID) where the records of the referred table are located, so you can define the reference relationship between tables without using join operations.

That is, in an object-oriented database, you can create a composition relation where one record has a reference value to another by using the column displayed in the referred table as a domain (type), instead of referring to the primary key column from the referred table.

Suggestions/comments on what to try first?

Lucene-3759: Support joining in a distributed environment

Thursday, February 9th, 2012

Support joining in a distributed environment.

From the description:

Add two more methods in JoinUtil to support joining in a distributed manner.

  • Method to retrieve all from values.
  • Method to create a TermsQuery based on a set of from terms.

With these two methods distributed joining can be supported following these steps:

  1. Retrieve from values from each shard
  2. Merge the retrieved from values.
  3. Create a TermsQuery based on the merged from terms and send this query to all shards.

Topic maps that have been split into shards could have values that would trigger merging if present in a single shard.

This appears to be a way to address that issue.

Time spent with Lucene is time well spent.

On Sharding Graph Databases

Thursday, February 17th, 2011

On Sharding Graph Databases

Jim Webber starts a discussion on sharding graph databases.

Interested to learn how lessons here can be applied to sharding topic maps.

How Sharding Works – Presentation – 4 Feb. 2010

Thursday, January 27th, 2011

How Sharding Works, a presentation by Kristina Chodorow, author of MongoDB: The Definitive Guide.

Date: 4 Feb. 2010


Of interest to topic maps that partition topics.

I thought last night after I wrote a draft of this post that sharding would interfere with arbitrary merging of any topic with any other topic.

OK, so what was the question?

True, sharding will make merging of arbitrary topics in a topic map more costly (if possible at all) but how often is completely unconstrained merging an actual requirement?

I suspect that most topic map projects, other than theoretical ones, already know what merging they are interested in and how those subjects are going to be identified.

Allowances for additional identifications of subjects should be made but that is a matter of careful design of your topic map.

Suggestion: Have merging specified just like any other requirement. What is expected? What is the criteria for success? What allowances need to be made for future expansion?