Sharding « Another Word For It

Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

May 29, 2012

CUBRID

Filed under: CUBRID,NoSQL,Sharding — Patrick Durusau @ 2:41 pm

I stumbled upon CUBRID via its Important Facts to Know about CUBRID page, where the first entry reads:

Naming Conventions:

The name of this DBMS is CUBRID, written in capital letters, and not Cubrid. We would appreciate much if you followed this naming conventions. It should be fairly simple to remember, itsn’t it!?

Got my attention!

Not for a lack of projects with “attitude” on the Net but a project with “attitude” that expressed it cleverly. Not just offensively.

Features of CUBRID:

Here are the key features that make CUBRID the most optimized open source database management system:

High Availability (HA) – Probably the most important feature in any database management system.

Powerful backup features – CUBRID open source database comes with easy to use backup/restore mechanism;

High Performance – Powerful optimizations and features for increased performance;

Java Stored Procedures – You may write CUBRID powered applications in a variety of languages;

CUBRID Manager – Easy and Secure administration of all your CUBRID instances regardless of their location;

CUBRID Query Browser – Lightweight version of CUBRID Manager;

CUBRID Demo Applications – Try CUBRID Applications directly online with no installation required;

First time I have seen CUBRID.

Does promise a release supporting sharding in June 2012.

The documentation posits extensions to the relational data model:

Extending the Relational Data Model

Collection

For the relational data model, it is not allowed that a single column has multiple values. In CUBRID, however, you can create a column with several values. For this purpose, collection data types are provided in CUBRID. The collection data type is mainly divided into SET, MULTISET and LIST; the types are distinguished by duplicated availability and order.

SET : A collection type that does not allow the duplication of elements. Elements are stored without duplication after being sorted regardless of their order of entry.

MULTISET : A collection type that allows the duplication of elements. The order of entry is not considered.

LIST : A collection type that allows the duplication of elements. Unlike with SET and MULTISET, the order of entry is maintained.

Inheritance

Inheritance is a concept to reuse columns and methods of a parent table in those of child tables. CUBRID supports reusability through inheritance. By using inheritance provided by CUBRID, you can create a parent table with some common columns and then create child tables inherited from the parent table with some unique columns added. In this way, you can create a database model which can minimize the number of columns.

Composition

In a relational database, the reference relationship between tables is defined as a foreign key. If the foreign key consists of multiple columns or the size of the key is significantly large, the performance of join operations between tables will be degraded. However, CUBRID allows the direct use of the physical address (OID) where the records of the referred table are located, so you can define the reference relationship between tables without using join operations.

That is, in an object-oriented database, you can create a composition relation where one record has a reference value to another by using the column displayed in the referred table as a domain (type), instead of referring to the primary key column from the referred table.

Suggestions/comments on what to try first?

Comments (1)

February 9, 2012

Lucene-3759: Support joining in a distributed environment

Filed under: Lucene,Query Expansion,Sharding — Patrick Durusau @ 4:26 pm

Support joining in a distributed environment.

From the description:

Add two more methods in JoinUtil to support joining in a distributed manner.

Method to retrieve all from values.

Method to create a TermsQuery based on a set of from terms.

With these two methods distributed joining can be supported following these steps:

Retrieve from values from each shard

Merge the retrieved from values.

Create a TermsQuery based on the merged from terms and send this query to all shards.

Topic maps that have been split into shards could have values that would trigger merging if present in a single shard.

This appears to be a way to address that issue.

Time spent with Lucene is time well spent.

Comments Off

February 17, 2011

On Sharding Graph Databases

Filed under: Graphs,Sharding — Patrick Durusau @ 6:50 am

On Sharding Graph Databases

Jim Webber starts a discussion on sharding graph databases.

Interested to learn how lessons here can be applied to sharding topic maps.

Comments Off

January 27, 2011

How Sharding Works – Presentation – 4 Feb. 2010

Filed under: NoSQL,Sharding,Topic Maps — Patrick Durusau @ 7:54 am

How Sharding Works, a presentation by Kristina Chodorow, author of MongoDB: The Definitive Guide.

Date: 4 Feb. 2010

Of interest to topic maps that partition topics.

I thought last night after I wrote a draft of this post that sharding would interfere with arbitrary merging of any topic with any other topic.

OK, so what was the question?

True, sharding will make merging of arbitrary topics in a topic map more costly (if possible at all) but how often is completely unconstrained merging an actual requirement?

I suspect that most topic map projects, other than theoretical ones, already know what merging they are interested in and how those subjects are going to be identified.

Allowances for additional identifications of subjects should be made but that is a matter of careful design of your topic map.

Suggestion: Have merging specified just like any other requirement. What is expected? What is the criteria for success? What allowances need to be made for future expansion?

Comments Off