Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

April 15, 2011

TMQL Canonizer

Filed under: TMQL,TMQL4J — Patrick Durusau @ 6:34 am

TMQL Canonizer

This is a new service from the Topic Maps Lab but absent any documentation, it is hard to say what to expect from it.

For example, I took a query from the rather excellent TMQL tutorials by Sven Krosse (also of the Topic Maps Lab):

%prefix o http://psi.ontopia.net/music/
FOR $topic IN // tm:subject
RETURN
IF $topic ISA o:composer
THEN $topic >> indicators
ELSE $topic / tm:name [0]

Fed it to the canonizer and got this result:

QueryExpression([%prefix, o, http://psi.ontopia.net/music/, FOR, $topic, IN, //, tm:subject, RETURN, IF, $topic, ISA, o:composer, THEN, $topic, >>, indicators, ELSE, $topic, /, tm:name, [, 0, ]])
|–EnvironmentClause([%prefix, o, http://psi.ontopia.net/music/])
| |–PrefixDirective([%prefix, o, http://psi.ontopia.net/music/])
|–FlwrExpression([FOR, $topic, IN, //, tm:subject, RETURN, IF, $topic, ISA, o:composer, THEN, $topic, >>, indicators, ELSE, $topic, /, tm:name, [, 0, ]])
|–ForClause([FOR, $topic, IN, //, tm:subject])
| |–BindingSet([$topic, IN, //, tm:subject])
| |–VariableAssignment([$topic, IN, //, tm:subject])
| |–Variable([$topic])
| |–Content([//, tm:subject])
| |–QueryExpression([//, tm:subject])
| |–PathExpression([//, tm:subject])
| |–PostfixedExpression([//, tm:subject])
| |–SimpleContent([//, tm:subject])
| |–Anchor([tm:subject])
| |–Navigation([<<, types]) | |--StepDefinition([<<, types]) | |--Step([<<, types]) |--ReturnClause([RETURN, IF, $topic, ISA, o:composer, THEN, $topic, >>, indicators, ELSE, $topic, /, tm:name, [, 0, ]])
|–Content([IF, $topic, ISA, o:composer, THEN, $topic, >>, indicators, ELSE, $topic, /, tm:name, [, 0, ]])
|–PathExpression([$topic, ISA, o:composer])
| |–ISAExpression([$topic, ISA, o:composer])
| |–SimpleContent([$topic])
| | |–Anchor([$topic])
| |–SimpleContent([o:composer])
| |–Anchor([o:composer])
|–Content([$topic, >>, indicators])
| |–QueryExpression([$topic, >>, indicators])
| |–PathExpression([$topic, >>, indicators])
| |–PostfixedExpression([$topic, >>, indicators])
| |–SimpleContent([$topic, >>, indicators])
| |–Anchor([$topic])
| |–Navigation([>>, indicators])
| |–StepDefinition([>>, indicators])
| |–Step([>>, indicators])
|–Content([$topic, /, tm:name, [, 0, ]])
|–QueryExpression([$topic, /, tm:name, [, 0, ]])
|–PathExpression([$topic, /, tm:name, [, 0, ]])
|–PostfixedExpression([$topic, /, tm:name, [, 0, ]])
|–SimpleContent([$topic, /, tm:name, [, 0, ]])
|–Anchor([$topic])
|–Navigation([/, tm:name, [, 0, ]])
|–StepDefinition([>>, characteristics, tm:name])
| |–Step([>>, characteristics, tm:name])
| |–Anchor([tm:name])
|–StepDefinition([>>, atomify, [, 0, ]])
|–Step([>>, atomify])
|–FilterPostfix([[, 0, ]])
|–Anchor([0])

OK, so I omitted the prefix on composer for the following query:

%prefix o http://psi.ontopia.net/music/
FOR $topic IN // tm:subject
RETURN
IF $topic ISA composer
THEN $topic >> indicators
ELSE $topic / tm:name [0]

Then I get:

QueryExpression([%prefix, o, http://psi.ontopia.net/music/, FOR, $topic, IN, //, tm:subject, RETURN, IF, $topic, ISA, composer, THEN, $topic, >>, indicators, ELSE, $topic, /, tm:name, [, 0, ]])
|–EnvironmentClause([%prefix, o, http://psi.ontopia.net/music/])
| |–PrefixDirective([%prefix, o, http://psi.ontopia.net/music/])
|–FlwrExpression([FOR, $topic, IN, //, tm:subject, RETURN, IF, $topic, ISA, composer, THEN, $topic, >>, indicators, ELSE, $topic, /, tm:name, [, 0, ]])
|–ForClause([FOR, $topic, IN, //, tm:subject])
| |–BindingSet([$topic, IN, //, tm:subject])
| |–VariableAssignment([$topic, IN, //, tm:subject])
| |–Variable([$topic])
| |–Content([//, tm:subject])
| |–QueryExpression([//, tm:subject])
| |–PathExpression([//, tm:subject])
| |–PostfixedExpression([//, tm:subject])
| |–SimpleContent([//, tm:subject])
| |–Anchor([tm:subject])
| |–Navigation([<<, types]) | |--StepDefinition([<<, types]) | |--Step([<<, types]) |--ReturnClause([RETURN, IF, $topic, ISA, composer, THEN, $topic, >>, indicators, ELSE, $topic, /, tm:name, [, 0, ]])
|–Content([IF, $topic, ISA, composer, THEN, $topic, >>, indicators, ELSE, $topic, /, tm:name, [, 0, ]])
|–PathExpression([$topic, ISA, composer])
| |–ISAExpression([$topic, ISA, composer])
| |–SimpleContent([$topic])
| | |–Anchor([$topic])
| |–SimpleContent([composer])
| |–Anchor([composer])
|–Content([$topic, >>, indicators])
| |–QueryExpression([$topic, >>, indicators])
| |–PathExpression([$topic, >>, indicators])
| |–PostfixedExpression([$topic, >>, indicators])
| |–SimpleContent([$topic, >>, indicators])
| |–Anchor([$topic])
| |–Navigation([>>, indicators])
| |–StepDefinition([>>, indicators])
| |–Step([>>, indicators])
|–Content([$topic, /, tm:name, [, 0, ]])
|–QueryExpression([$topic, /, tm:name, [, 0, ]])
|–PathExpression([$topic, /, tm:name, [, 0, ]])
|–PostfixedExpression([$topic, /, tm:name, [, 0, ]])
|–SimpleContent([$topic, /, tm:name, [, 0, ]])
|–Anchor([$topic])
|–Navigation([/, tm:name, [, 0, ]])
|–StepDefinition([>>, characteristics, tm:name])
| |–Step([>>, characteristics, tm:name])
| |–Anchor([tm:name])
|–StepDefinition([>>, atomify, [, 0, ]])
|–Step([>>, atomify])
|–FilterPostfix([[, 0, ]])
|–Anchor([0])

So then I enter a query that omits the “$” from the second instance of topic:

%prefix o http://psi.ontopia.net/music/
FOR $topic IN // tm:subject
RETURN
IF topic ISA o:composer
THEN $topic >> indicators
ELSE $topic / tm:name [0]

You can enter that one for yourself. No substantive change in result.

By omitting the “$” from all instances of topic I was finally able to get “an invalid expression” result.

Do note that the following is treated as a valid expression:

%prefix o http://psi.ontopia.net/music/
FOR $topic IN // tm:subject
RETURN
IF topic ISA o:composer
THEN topic >> indicators
ELSE topic / tm:name [0]

A bit more attention to documentation would go a long way to making this a useful project.

*****
PS: From the 2008 TMQL draft:

Examples for invalid variables are x (sigil missing),

10 Comments

  1. Let me add the documentation here:

    Goal: Some people say that the 2008 TMQL draft (aka Barta draft or TMQL@rho) is unreadable because of it’s concise syntax. However, if one takes the time to read the 3 paragraphs of Chapter 3.1 “Syntax Conventions” (http://isotopicmaps.org/tmql/tmql.html#SyntaxConventions), one would know, that there are several syntax levels. One is the very verbose canonical level and the nice concise non-canonical level. Those poor people who do not understand queries written in the non-canonical level should be able to obtain the non-canonical query somehow.

    Solution: http://canonizer.topicmapslab.de/tmql4j-web-ui/ allows to input any query and transforms all non-canonical parts to their canonical counterparts.

    Example Input:

    benjamin document thing / title

    Corresponding output (when using the 2008 draft):

    benjamin <> types == authorship ] >> players document <> players thing >> characteristics title >> atomify

    These two queries are identical and — according to some people (not including myself) — the latter is easier to read, easier to write, and easier to understand.

    If we assume that an undisclosed in-progress 2011 draft had a similar non-canonical level but e.g. no backward navigation on axes but special axes for the corresponding tasks, the output may look something like this:

    benjamin / played-roles author / players document / played-roles documented / players thing

    The title of the documented thing was left out because this draft currently does not have a shortcut for getting the title. The watchful reader may also notice that the association-type-filter is missing here. This is obviously a bug. However, the result shows how a path query may look when using only forward navigation.

    The parser tree is just debug output of TMQL4J to help the user understand what TMQL4J does on the inside. It not part of the above-stated goal which lead to the implementation of the service.

    End of documentation.

    Conclusion: Yes, documentation would have been good, because you completely missed the raison d’être of the *canonizer* which is: producing a *canonized* query. The parse tree is just a nice-to-have-feature which most likely doesn’t help if you don’t want to dive into TMQL4J internals.

    One could argue that the news entry (yes, some people like to write news entries about every geegaw) http://www.topicmapslab.de/news/tmql-canonizer-released may be considered as documentation. At least it says what you have to put in and that it returns the canonical form of a query. It mentions that the parse tree is – besides the main goal — *also* output.

    The choice of the example in the news suggests that the author of the news didn’t get the point of the whole service as he put in a query which is almost completely in canonical form, but do you really expect me to compromise my boss here? (-;

    Let’s take the raison d’être one step further. I asked Sven Krosse to implement the canonizer while I wrote the proposal to make the Barta draft the official TMQL standard (http://www.isotopicmaps.org/pipermail/topicmapmail/2011q2/008981.html). The results were overwhelming: Nobody disagreed, nobody vetoed. There was not a single critique. Only the original author stated that he went on and created something even more powerful. So, the thing the canonizer can and should be used for is taking the examples from Robert’s draft and see what their canonical version looks like which — as stated above — may be more readable for some of us.

    Back to the topic: The canonical output for the 2011 draft above is not the same as the one which was discussed on the mailinglist lately.
    This draft would additionally use parentheses (the following example is manual work, not output of the canonizer):

    benjamin / roles(author) / parent(authorship) / players(document) / roles(documented) / parent() / players(thing) / names(title)

    There’s another shortcut for traversing associations and a default axis on topic which includes the names in this 2011 draft. Using these rules for uncanonizing the above again, the result is something like this:

    benjamin / authorship(author->document) / tm:subject(documented->thing) / title

    Unfortunately, there is no uncanonizer service yet, which would produce a concise query automatically from a given long query.

    Comment by Benjamin Bock — April 16, 2011 @ 9:51 am

  2. This blog software behaves like a 1998 web forum and strips “special characters. I don’t know how to markup my post correctly as there is no documentation given here…

    Let me use the words of a wise man:

    “A bit more attention to documentation would go a long way to making this a useful […][blog and commenting software]”. (Ellipsis and addition by the author of the comment.)

    Comment by Benjamin Bock — April 16, 2011 @ 9:56 am

  3. The link in the tweet from Topic Maps Lab and the one you see in my post, point to: http://canonizer.topicmapslab.de/tmql4j-web-ui/canonizer.jsp.

    Which had the following “documentation:”

    “This service uses the TMQL draft of 2008 and the tmql4j query engine version 3.1.0.”

    You point to somewhat fuller documentation, a pointer missing from the tweet and from the service itself.

    Curious that the tweet from the Topic Maps Lab didn’t point to its own announcement with the “documentation.”

    If the goal of the canonizer is to produce canonical output, shouldn’t failures in query syntax produce an error?

    The blogging software is publicly documented. http://www.wordpress.org

    Comment by Patrick Durusau — April 16, 2011 @ 10:29 am

  4. So, I go to http://www.wordpress.org, which is even linked from here! Yay!
    Browser’s text search for “documentation”. No results.
    Now using the search box at the top. No useful results either.
    “markup”? no
    “syntax”? well… there are plugins. kthxbye.

    What about the 10-step walk-through guide? It has screenshots of a visual editor in the 5th section. But it does not resemble the simple input box I use here.

    Back to the syntax highlighter plugins (is one installed here? How could I find out?). At least, they have their own examples crippled, so it seems to be a general problem.

    Let’s conclude: From http://canonizer.topicmapslab.de/tmql4j-web-ui/canonizer.jsp documentation is 2 clicks away. From here it’s more than 2. I still don’t know where.

    Thanks for the “useful” pointer to the start page. This is what is provided from the canonizer as well: a link to the start page which directly contains the announcement.

    So, back to the canonizer:
    Which is the query which isn’t syntactical correct but doesn’t produce an error in the canonizer? This would be a bug in TMQL4J which I’d like to see fixed.

    Comment by Benjamin Bock — April 16, 2011 @ 11:03 am

  5. Your browser must not work like mine does. There is a link at the bottom of the page to TMQL4J documentation but not canonizer documentation. No other links to canonizer documentation that I can see.

    Syntactically incorrect query (which I noted in my original post:


    %prefix o http://psi.ontopia.net/music/
    FOR $topic IN // tm:subject
    RETURN
    IF topic ISA o:composer
    THEN topic >> indicators
    ELSE topic / tm:name [0]

    According to the 2008 draft, omission of the sigil on a variable is an error.

    If the sigil in “FOR $topic IN // tm:subject” is omitted, an error is returned.

    But, omission on the next three instances of “topic” is equally an error, and no error is returned.

    This was noted in my original post.

    Comment by Patrick Durusau — April 16, 2011 @ 2:01 pm

  6. The above query is syntactically valid because “topic” is a valid item-reference.

    I the first position where the $ is needed, rule [44] applies:

    [44] variable-assignment ::= variable in content

    If you remove the sigil there the variable-assignment rule no longer matches and an error is thrown.

    However, in all the other positions, rule [22] matches:

    [20] anchor ::= constant | variable

    So, one can use either a variable or a constant. If it has a $ the “variable” part matches, without sigil, the “constant” part matches.
    A constant can be an atom or an item-reference according to

    [1] constant ::= atom | item-reference

    In your case, it is an item-reference according to

    [17] item-reference ::= identifier | QIRI |

    and using the “identifier” part we finally reach

    [16] identifier ::= /\w[\w\-\.]*/

    As you can see, “topic” (without the quotes) is a valid identifier. With quotes, it would of course be an atom.

    So, yes, the omission of the sigil on a variable is an error, but in the above usage it’s no variable but an identifier which serves as an item-reference.

    Conclusion: Implementation correct. Bug report invalid.

    Thanks for taking the time to test it, anyway. Just because this is no bug, it does not mean there are no other bugs.

    Comment by Benjamin Bock — April 16, 2011 @ 3:27 pm

  7. Regarding your stupid “there is documentation somewhere on wordpress.com” thing: I still didn’t find it.

    The two clicks from http://canonizer.topicmapslab.de/ are:
    1) on the link “Topic Maps Lab” at the bottom of the page, just before the copyright sign
    2) on the link “TMQL canonizer Released” in the middle right part of the page, within the “News from the Topic Maps Lab” section.

    I agree it’s suboptimal but it’s two clicks.
    Now please show me some documentation about the syntax usable in these comments. I’m looking for something like the stuff on this page: http://www.textism.com/tools/textile/ or like http://projects.topicmapslab.de/help/wiki_syntax.html which is linked from the top right of every input box on http://projects.topicmapslab.de/ which accepts this syntax, see e.g. http://projects.topicmapslab.de/projects/topicmapsorg/issues/new

    Comment by Benjamin Bock — April 16, 2011 @ 3:35 pm

  8. Your “path” to documentation for the canonizer only works for an insider or someone who would guess hunting around the site might lead to documentation. That isn’t suboptimal, it’s bad design. All the huffing and puffing won’t change that.

    Concerning my “stupid” pointer to the WordPress site:

    1. http://wordpress.org/
    2. Choose #3 Read the Documentation
    3. Under Getting Started with WordPress, the very first link: New To WordPress – Where to Start
    4. Under Step 4 – Setup WordPress, choose Writing Posts
    5. Scroll to near the bottom of that page.

    Yes, that is bad design, but two bad designs don’t make either one right.

    Comment by Patrick Durusau — April 16, 2011 @ 6:34 pm

  9. Thanks for the path to the documentation. Those quicktags are obviously just a subset of HTML. Not that hard.

    Regarding finding documentation Looks like I was looking on wordpress.COM yesterday. This is obviously my fault.

    The documentation you pointed at says no word about how to do the greater than and the smaller than sign, but as the above looks like HTML, I can als well assume it works here:

    benjamin <- author [^ authorship] -> document <- documented -> thing / title

    And I can try the code tag:

    benjamin document thing / title

    Comment by Benjamin Bock — April 17, 2011 @ 1:39 am

  10. Conclusion: using & lt ; and & gt ; works, but the code-tag is useless except it’s monospaced font. Would have been easier to try if there was a preview mode.

    Comment by Benjamin Bock — April 17, 2011 @ 1:41 am

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress