Data Integration: The Relational Logic Approach pays homage to what is called the N-squared problem. The premise of N-squared for data integration is that every distinct identification must be mapped to every other distinct identification. Here is a graphic of the N-squared problem.
Two usual responses, depending upon the proposed solution.
First, get thee to a master schema (probably the most common). That is map every distinct data source to a common schema and all clients have to interact with that one schema. Case closed. Except data sources come and go, as do clients so there is maintenance overhead. Maintenance can take time to agree on updates.
Second, no system integrates every other possible source of data, so the fear of N-squared is greatly exaggerated. Not unlike the sudden rush for “big data” solutions whether the client has “big data” or not. Who would want to admit to having “medium” or even “small” data?
The third response that is of topic maps. The assumption that every identification must map to every other identification means things get ugly in a hurry. But topic maps question the premise of the N-Squared problem, that every identification must map to every other identification.
Here is an illustration of how five separate topic maps, with five different identifications of a popular comic book character (Superman), can be combined and yet avoid the N-Squared problem. In fact, topic maps offer an N+1 solution to the problem.
Each snippet, written in Compact Topic Map (CTM) syntax represents a separate topic map.
en-superman
http://en.wikipedia.org/wiki/Super_man ;
- "Superman" ;
- altname: "Clark Kent" .
***
de-superman
http://de.wikipedia.org/wiki/Superman ;
- "Superman" ;
- birthname: "Kal-El" .
***
fr-superman
http://fr.wikipedia.org/wiki/Superman ;
- "Superman" ;
birthplace: "Krypton" .
***
it-superman
http://it.wikipedia.org/wiki/Superman ;
- "Superman" ;
- altname: "Man of Steel" .
***
eo-superman
http://eo.wikipedia.org/wiki/Superman ;
- "Superman" ;
- altname: "Clark Joseph Kent" .
Copied into a common file, superman-N-squared.ctm, nothing happens. That’s because they all have different subject identifiers. What if I add to the file/topic map, the following topic:
superman
http://en.wikipedia.org/wiki/Super_man ;
http://de.wikipedia.org/wiki/Superman ;
http://fr.wikipedia.org/wiki/Superman ;
http://it.wikipedia.org/wiki/Superman ;
http://eo.wikipedia.org/wiki/Superman .
Results in the file, superman-N-squared-solution.ctm.
Ooooh.
Or an author know one other identifier. So long as any group of authors uses at least one common identifier between any two maps, it results in the merger of their separate topic maps. (Ordering of the merges may be an issue.)
Another way to say that is that the trigger for merging of identifications is decentralized.
Which gives you a lot more eyes on the data, potential subjects and relationships between subjects.
PS: Did you know that the English and German versions gives Superman’s cover name as “Clark Kent,” while the French, Italian and Esperanto versions give his cover name as “Clark Joeseph Kent?”
PPS: The files are both here, Superman-Semantics-01.zip.
[…] I was searching for N-squared references when I encountered this paper. You can see what I think is the topic map answer to the N-squared problem at: Semantic Integration: N-Squared to N+1 (and decentralized). […]
Pingback by Four Levels of Data Integration (Charteris White Paper) « Another Word For It — September 30, 2011 @ 7:06 pm
[…] Semantic Integration: N-Squared to N+1 (and decentralized) I demonstrated how four (4) different authors could have four (4) different identifiers for […]
Pingback by Whose Afraid of Topic Maps? (see end for alt title) « Another Word For It — October 4, 2011 @ 7:52 pm