Archive for the ‘Normalization’ Category

Neo4j: The ‘thinking in graphs’ curve

Thursday, November 28th, 2013

Neo4j: The ‘thinking in graphs’ curve by Mark Needham

From the post:

In a couple of Neo4j talks I’ve done recently I’ve been asked how long it takes to get used to modelling data in graphs and whether I felt it’s simpler than alternative approaches.

My experience of ‘thinking in graphs’™ closely mirrors what I believe is a fairly common curve when learning technologies which change the way you think:

Learning curve for graphs

There is an initial stage where it seems really hard because it’s different to what we’re used to and at this stage we might want to go back to what we’re used to.

If we persevere, however, we will get over that hump and after that it will be difficult to imagine another way of modelling data – at least in domains where the connections between different bits of data are important.

Once we’re over the hump data modelling should seem like fun and we’ll start looking to see whether we can use graphs to solve other problems we’ve got.

I wasn’t sure whether modelling in graphs is simpler than alternative approaches so as a thought experiment I decided to see what part of my football graph would look like if it was modelled in a relational database.

See Mark’s post for the comparison between a normalized relational database model versus a graph model.

I suspect Mark is right about the difficulty of moving from a fully normalized relational paradigm to graphs, but no one grows up thinking in normalized relational databases.

Remember your first encounter with databases (mine was DBase III or was that DBase II?), the normalized relational paradigm seemed unnatural. On a par with unmentionable practices.

Here’s an experiment you can try with non-IT and IT people.

Show both groups Mark’s diagrams and ask them which one is easier to understand?

I think you know where my money is riding. 😉

Could be useful empirical knowledge in terms of preparing educational materials for the respective groups.

Data modeling … with graphs

Wednesday, November 7th, 2012

Data modeling … with graphs by Peter Bell.

Nothing surprising for topic map users but a nice presentation on modeling for graphs.

For Neo4j, unlike topic maps, you have to normalize your data before entering it into the graph.

That is if you want one node per subject.

Depends on your circumstances if that is worthwhile.

Amazing things have been done with normalized data in relational databases.

Assuming you want to pay the cost of normalization, which can include a lack of interoperability with others, errors in conversion, brittleness in the face of changing models, etc.

Ad Hoc Normalization II

Wednesday, November 30th, 2011

After writing Ad Hoc Normalization it occurred to me that topic maps offer another form of “ad hoc” normalization.

I don’t know what else you would call merging two topic maps together?

Try that with two relational databases.

So, not only can topic maps maintain “internal” ad hoc normalization but also “external” ad hoc normalization with data sources that were not present at the time of the creation of a topic map.

But there are other forms of normalization.

Recall that Lars Marius talks about the reduction of information items that represent the same subjects. That can only occur when there is a set of information items that obey the same data model and usually the same syntax. I would call that information model normalization. That is whatever is supported by a particular information model can be normalized.

For relational databases that is normalization by design and for topic maps that is ad hoc normalization (although some of it could be planned in advance as well).

But there is another form of normalization. A theoretical construct but subject-based normalization. I say it is theoretical because in order to instantiate a particular case you have to cross over into the land of information model normalization.

I find subject-based normalization quite useful, mostly because as human designers/authors, we are not constrained by the limits of our machines. We can hold contradictory ideas at the same time without requiring a cold or hot reboot. Subject-based normalization allows us to communicate with other users what we have seen in data and how we need to process it for particular needs.

Ad Hoc Normalization

Tuesday, November 29th, 2011

I really should not start reading Date over the weekend. It puts me in a relational frame of mind and I start thinking of explanations of topic maps in terms of the relational model.

For example, take his definition of:

First normal form: A relvar is in 1NF if and only if, in every legal value of that relvar, every tuple contains exactly one value for each attribute. (page 358)

Second normal form: (definition assuming only one candidate key, which we assume is the primary key): a relvar is in 2NF if and only if it is in 1NF and every nonkey attribute is irreducibly dependent on the primary key. (page 361)

Third normal form: (definition assuming only one candidate key, which we assume is the primary key): A relvar is in 3NF if and only if it is in 2NF and every nonkey attribute is nontransitively dependent on the primary key. (page 363)

Third normal form (even more informal definition): A relvar is in third normal form (3NF) if and only if, for all time, each tuple consists of a primary key value that identifies some entity, together with a set of zero or more mutually independent values that describe that entity in some way.

Does that mean that topic maps support ad hoc normalization? That is we don’t have to design in normalization before we start writing the topic map but can decide on what subjects need to be “normalized,” that is represented by topics that read to a single representative, after we have started writing the topic map.

Try that with a relational database and tables of any complexity. If you don’t get it right at the design stage, fixing it becomes more expensive as time goes by.

Not a “dig” at relational databases. If your domain is that slow changing and other criteria point to a relational solution, by all means, use one. Performance numbers are hard to beat.

On the other hand, if you need “normalization” an yet you have a rapidly changing environment that is subject to exploration and mappings across domains, you should give topic maps a hard look. Ask for “Ad Hoc Normalization” by name. 😉

PS: I suspect this is what Lars Marius meant by Topic Maps Data Model (TMDM) 6. Merging, 6.1 General:

A central operation in Topic Maps is that of merging, a process applied to a topic map in order to eliminate redundant topic map constructs in that topic map. This clause specifies in which situations merging shall occur, but the rules given here are insufficient to ensure that all redundant information is removed from a topic map.

Any change to a topic map that causes any set to contain two information items equal to each other shall be followed by the merging of those two information items according to the rules given below for the type of information item to which the two equal information items belong.

But I wasn’t “hearing” “…eliminate redundant topic maps constructs…” as “normalization.”