Archive for the ‘Legends’ Category

Appropriating IT: Glue Steps [Gluing Subject Representatives Together?]

Tuesday, October 9th, 2012

Appropriating IT: Glue Steps by Tony Hirst.

Over the years, I’ve been fortunate enough to have been gifted some very evocative, and powerful, ideas that immediately appealed to me when I first heard them and that I’ve been able to draw on, reuse and repurpose over and over again. One such example is “glue logic”, introduced to me by my original OU PhD supervisor George Kiss. The idea of glue logic is to provide a means by which two digital electronic circuits (two “logic” circuits) that don’t share a common interface can be “glued” together.

(diagrams and other material omitted)

This idea is powerful enough in its own right, but there was a second bit to it that made it really remarkable: the circuitry typically used to create the glue logic was a device known as a Field Programmable Gate Array, or FPGA. This is a type of digital circuit whose logical function can be configured, or programmed. That is, I can take my “shapeless” FPGA, and programme it so that it physically implements a particular digital circuit. Just think about that for a moment… You probably have a vague idea that the same computer can be reprogrammed to do particular things, using some vaguely mysterious and magical thing called software, instructions that computer processors follow in order to do incredible things. With an FPGA, the software actually changes the hardware: there is no processor that “runs a programme”; when you programme an FPGA, you change its hardware. FPGAs are, literally, programmable chips. (If you imagine digital circuits to be like bits of plastic, an FPGA is like polymorph.)

The notion of glue logic has stuck with me for two reasons, I think: firstly, because of what it made possible, the idea of flexibly creating an interface between two otherwise incompatible components; secondly, because of the way in which it could be achieved – using a flexible, repurposable, reprogrammable device – one that you could easily reprogramme if the mapping from one device to another wasn’t quite working properly.

If instead of “don’t share a common interface” you read “semantic diversity” and in place of Field Programmable Gate Array, or FPGA, you read “legend,” to “creat[e] an interface between two otherwise incompatible [subject representatives],” you would think Tony’s post was about the topic maps reference model.

Well, this post is and Tony’s is very close.

Particularly the part about being a “reprogrammable device.”

I can tell you: “black” = “schwarz,” but without more, you won’t be able to rely on or extend that statement.

For that, you need a “reprogrammable device” and some basis on which to do the reprogramming.

Legends anyone?

DSL for the Uninitiated (Legends?)

Saturday, October 1st, 2011

DSL for the Uninitiated by Debasish Ghosh

From the post:

One of the main reasons why software projects fail is the lack of communication between the business users, who actually know the problem domain, and the developers who design and implement the software model. Business users understand the domain terminology, and they speak a vocabulary that may be quite alien to the software people; it’s no wonder that the communication model can break down right at the beginning of the project life cycle.

A DSL (domain-specific language)1,3 bridges the semantic gap between business users and developers by encouraging better collaboration through shared vocabulary. The domain model that the developers build uses the same terminologies as the business. The abstractions that the DSL offers match the syntax and semantics of the problem domain. As a result, users can get involved in verifying business rules throughout the life cycle of the project.

This article describes the role that a DSL plays in modeling expressive business rules. It starts with the basics of domain modeling and then introduces DSLs, which are classified according to implementation techniques. The article then explains in detail the design and implementation of an embedded DSL from the domain of securities trading operations.

The subject identity and merging requirements of a particular domain are certainly issues where users, who actually know the problem domain, should be in the lead. Moreover, if users object to some merging operation result, that will bring notice to perhaps unintended consequences of an identity or merging rule.

Perhaps the rule is incorrect, perhaps there are assumptions yet to be explored, but the focus in on the user’s understanding of the domain, where it should be (assuming the original coding is correct).

This sounds like a legend to me.

BTW, the comments point to Lisp resources that got to DSLs first (as is the case with most/all programming concepts):

Matthias Felleisen | Thu, 04 Aug 2011 22:26:46 UTC

DSLs have been around in the LISP world forever. The tools for building them and for integrating them into the existing toolchain are far more advanced than in the JAVA world. For an example, see

http://www.ccs.neu.edu/scheme/pubs/#pldi11-thacff for a research-y introduction

or

http://hashcollision.org/brainfudge/ for a hands-on introduction.

You might also want to simply start at the Racket homepage.

Another Word For It at #2,000

Thursday, July 28th, 2011

According to my blogging software this is my 2,000th post!

During the search for content and ideas for this blog I have thought a lot about topic maps and how to explain them.

Or should I say how to explain topic maps without inventing new terminologies or notations? ;-)

Topic maps deal with a familiar problem:

People use different words when talking about the same subject and the same word when talking about different subjects.

Happens in conversations, newspapers, magazines, movies, videos, tv/radio, texts, and alas, electronic data.

The confusion caused by using different words for the same subject and same word for different subjects is a source of humor. (What does “nothing” stand for in Shakespeare’s “Much Ado About Nothing”?)

In searching electronic data, that confusion causes us to miss some data we want to find (different word for the same subject) and to find some data we don’t want (same word but different subject).

When searching old newspaper archives this can be amusing and/or annoying.

Potential outcomes of failure elsewhere:

medical literature injury/death/liability
financial records civil/criminal liability
patents lost opportunities/infringement
business records civil/criminal liability

Solving the problem of different words for the same subject and the same word but different subjects is important.

But how?

Topic maps and other solutions have one thing in common:

They use words to solve the problem of different words for the same subject and the same word but different subjects.

Oops!

The usual battle cry is “if everyone uses my words, we can end semantic confusion, have meaningful interchange for commerce, research, cultural enlightenment and so on and so forth.”

I hate to be the bearer of bad news but what about all the petabytes of data we already have on hand with zettabytes of previous interpretations? With more being added every day and not universal solution in sight? (If you don’t like any of the current solutions, wait a few months and new proposals, schemas, vocabularies, etc., will surface. Or you can take the most popular approach and start your own.)

Proposals to deal with semantic confusion are also frozen in time and place. Unlike the human semantics they propose to sort out, they do not change and evolve.

We have to use the source of semantic difficulty, words, in crafting a solution and our solution has to evolve over time even as our semantics do.

That’s a tall order.

Part of the solution, if you want to call it that, is to recognize when the benefits of solving semantic confusion outweighs the cost of the solution. We don’t need to solve semantic confusion everywhere and anywhere it occurs. In some cases, perhaps rather large cases, it isn’t worth the effort.

That triage of semantic confusion allows us to concentrate on cases where the investment of time and effort are worthwhile. In searching for the Hilton Hotel in Paris I may get “hits” for someone with underwear control issues but so what? Is that really a problem that needs a solution?

On the other hand, being able to resolve semantic confusion, such as underlies different accounting systems for businesses, could give investors a clearer picture of the potential risks and benefits of particular investments. Or doing the same for financial institutions so that regulators can “look down” into regulated systems with some semantic coherence (without requiring identical systems).

Having chosen some semantic confusion to resolve, we then have to choose a method to resolve it.

One method, probably the most popular one, is the “use my (insert vocabulary)” method for resolving semantic confusion. Works and for some cases, may be all that you need. Databases with gigabyte size tables (and larger) operate quite well using this approach. Can become problematic after acquisitions when migration to other database systems is required. Undocumented semantics can prove to be costly in many situations.

Semantic Web techniques, leaving aside the fanciful notion of unique identifiers, do offer the capability of recording additional properties about terms or rather the subjects that terms represent. Problematically though, they don’t offer the capacity to specify which properties are required to distinguish one term from another.

No, I am not about to launch into a screed about why “my” system works better than all the others.

Recognition that all solutions are composed of semantic ambiguity is the most important lesson of the Topic Maps Reference Model (TMRM).

Keys (of key/value pairs) are pointers to subject representatives (proxies) and values may be such references. Other keys and/or values may point to other proxies that represent the same subjects. Which replicates the current dilemma.

The second important lesson of the TMRM is the use of legends to define what key/value pairs occur in a subject representative (proxy) and how to determine two or more proxies represent the same subject (subject identity).

Neither lesson ends semantic ambiguity, nor do they mandate any particular technology or methodology.

They do enable the creation and analysis of solutions, including legends, with an awareness they are all partial mappings, with costs and benefits.

I will continue the broad coverage of this blog on semantic issues but in the next 1,000 posts I will make a particular effort to cover:

  • Ex Parte Declaration of Legends for Data Sources (even using existing Linked Data where available)
  • Suggestions for explicit subject identity mapping in open source data integration software
  • Advances in graph algorithms
  • Sample topic maps using existing and proposed legends

Other suggestions?

Managing Semantic Ambiguity

Wednesday, November 3rd, 2010

Topic maps do not and cannot eliminate semantic ambiguity. What topic maps can do is assist users in managing semantic ambiguity with regard to identification of particular subjects.

Consider the well-known ambiguity of whether a URI is an identifier or an address.

The Topic Maps Data Model (TMDM) provides a way to manage that ambiguity by providing a means to declare if a URI is being used and an identifier or as an address.

That is only “managing” the ambiguity because there is no mechanism to prevent incorrect use of that mechanism, which would result in ambiguity or even having the mechanism mis-understood entirely.

Identification by saying a subject representative (proxy) must have properties X…Xn is a collection of possible ambiguities that an author hopes will be understood by a reader.

Since we are trying to communicate with other people, there isn’t any escape from semantic ambiguity. Ever.

Topic maps provide the ability to offer more complete descriptions of subjects in hopes of being understood by others.

With the ability to add descriptions of subjects from others, offering users a variety of descriptions of the same subject.

We have had episodic forays into “certainty,” the Semantic Web being only one of the more recent failures in that direction. Ambiguity anyone?

Are simplified hadoop interfaces the next web cash cow? – Post

Wednesday, July 14th, 2010

Are simplified hadoop interfaces the next web cash cow? is a question that Brian Breslin is asking these days.

It isn’t that hard to imagine that not only Hadoop interfaces being cash cows but also canned analysis of public date sets that can be incorporated into those interfaces.

But then the semantics question comes back up when you want to join that canned analysis to your own. What did they mean by X? Or Y? Or for that matter, what are the semantics of the data set?

But we can solve that issue by explicit subject identification! Did I hear someone say topic maps? ;-) So our identifications of subjects in public data sets will themselves become a commodity. There could be competing set-similarity analysis of  public data sets.

If a simplified Hadoop interface is the next cash cow, we need to be ready to stuff it with data mapped to subject identifications to make it grow even larger. A large cash cow is a good thing, a larger cash cow is better and a BP-sized cash cow is just about right.

In Praise of Legends (and the TMDM in particular)

Thursday, March 11th, 2010

Legends enable topic maps to have different representations of the same subject. Standard legends, like the Topic Maps Data Model (TMDM), are what enable blind interchange of topic maps.

Legends do a number of things but among the more important, legends define the rules for the contents of subject representatives and the rules for comparing them. The TMDM defines three representatives for subjects, topics, associations and occurrences. It also defines how to compare those representatives to see if they represent the same subjects.

Just to pull one of those rules out, if two or more topics have an equal string in their [subject identifiers] property, the two topics are deemed to represent the same subject. (TMDM 5.3.5 Properties) The [subject identifiers] property is a set so a topic could have two or more different strings in that property to match other topics.

It is the definition of a basis for comparison (see the TMDM for the other rules for comparing topics) of topics that enables the blind interchange of topic maps that follow the TMDM. That is to say that I can author a topic map in XTM (one syntax that follows the TMDM) and reasonably expect that other users will be able to successfully process it.

I am mindful of Robert Cerny’s recent comment on encoding strings as URLs but don’t think that covers the case where the identifications of the subjects are dynamic. That is to say that the strings themselves are composed of strings that are subject to change as additional items are merged into the topic map.

The best use case that comes to mind is that of the current concern in the United States over the non-sharing of intelligence data. You know, someone calls up and says their son is a terrorist and is coming to the United States to commit a terrorist act. That sort of intelligence. That isn’t passed on to anyone. At least anyone who cared enough to share it, I don’t know, with the airlines perhaps?

If I can author a subject identification that includes a previously overlooked source of information, say the parent of a potential terrorist, in addition to paid informants, current/former drug lords, etc. the usual categories, then the lights aren’t simply blinking red there is actual information in addition to the blinking lights.

I really should wait for Robert to make his own arguments but if you think of URLs as simply strings, without any need for resolution, you could compose a dynamic identification, freeze it into a URL, then pass it along to a TMDM based system. You don’t get any addition information but that would be one way to input such information into a TMDM based system. If you control the server you could provide a resolution back into the dynamic subject identification system. (Will have to think about that one.)

I think of it as the TMDM using sets of immutable strings for subject identification and one of the things the TMRM authorizes, but does not mandate, is the use of mutable strings as subject identifiers.