Archive for the ‘XML Query Rewriting’ Category

Four and Twenty < / > ! Baked in a Pie…

Tuesday, May 28th, 2013

Balisage 2013 program is online!

From Tommie Usdin’s email:

Balisage is an annual conference devoted to the theory and practice of descriptive markup and related technologies for structuring and managing information. Participants typically include XML users, librarians, archivists, computer scientists, XSLT and XQuery programmers, implementers of XSLT and XQuery engines and other markup-related software, Topic-Map enthusiasts, semantic-Web evangelists, members of the working groups which define the specifications, academics, industrial researchers, representatives of governmental bodies and NGOs, industrial developers, practitioners, consultants, and the world’s greatest concentration of markup theorists. Discussion is open, candid, and unashamedly technical.

Major features of this year’s program include several challenges to the fundamental infrastructure of XML; case studies from government, academia, and publishing; approaches to overlapping data structures; discussions of XML’s political fortunes; and technical papers on XML, XForms, XQuery, REST, XSLT, RDF, XSL-FO, XSD, the DOM, JSON, and XPath.

Attending Balisage even once will keep you from repeating mistakes in language design.

Attending Balisage twice will mark you as a markup expert.

Attending Balisage three or more times, well, this is an open channel so we can’t go there.

But you should go to Balisage!

Send your pics from Saint Catherine Street!

Balisage 2013 – Dates/Location

Tuesday, November 20th, 2012

Tommie Usdin just posted email with the Balisage 2013 dates and location:

Montreal, Hotel Europa, August 5 – 9 , 2013

Hope that works with everything else.

That’s the entire email so I don’t know what was meant by:

Hope that works with everything else.

Short of it being your own funeral, open-heart surgery or giving birth (to your first child), I am not sure what “everything else” there could be?

You get a temporary excuse for the second two cases and a permanent excuse for the first one.

Now’s a good time to hint about plane fare plus hotel and expenses for Balisage as a stocking stuffer.

And to wish a happy holiday Tommie Usdin and to all the folks at Mulberry Technology who make Balisage possible all of us. Each and every one.

Destination: Montreal!

Tuesday, May 29th, 2012

If you remember the Saturday afternoon sci-fi movies, Destination: …., then you will appreciate the title for this post. 😉

Tommie Usdin and company just posted: Balisage 2012 Call for Late-breaking News, written in torn bodice style:

The peer-reviewed part of the Balisage 2012 program has been scheduled (and will be announced in a few days). A few slots on the Balisage program have been reserved for presentation of “Late-breaking” material.

Proposals for late-breaking slots must be received by June 15, 2012. Selection of late-breaking proposals will be made by the Balisage conference committee, instead of being made in the course of the regular peer-review process.

If you have a presentation that should be part of Balisage, please send a proposal message as plain-text email to

In order to be considered for inclusion in the final program, your proposal message must supply the following information:

  • The name(s) and affiliations of all author(s)/speaker(s)
  • The email address of the presenter
  • The title of the presentation
  • An abstract of 100-150 words, suitable for immediate distribution
  • Disclosure of when and where, if some part of this material has already been presented or published
  • An indication as to whether the presenter is comfortable giving a conference presentation and answering questions in English about the material to be presented
  • Your assurance that all authors are willing and able to sign the Balisage Non-exclusive Publication Agreement ( with respect to the proposed presentation

In order to be in serious contention for inclusion in the final program, your proposal should probably be either a) really late-breaking (it happened in the last month or two) or b) a paper, an extended paper proposal, or a very long abstract with references. Late-breaking slots are few and the competition is fiercer than for peer-reviewed papers. The more we know about your proposal, the better we can appreciate the quality of your submission.

Please feel encouraged to provide any other information that could aid the conference committee as it considers your proposal, such as a detailed outline, samples, code, and/or graphics. We expect to receive far more proposals than we can accept, so it’s important that you send enough information to make your proposal convincing and exciting. (This material may be attached to the email message, if appropriate.)

The conference committee reserves the right to make editorial changes in your abstract and/or title for the conference program and publicity. (emphasis added to last sentence)

Read that last sentence again!

The conference committee reserves the right to make editorial changes in your abstract and/or title for the conference program and publicity.

The conference committee might change your abstract and/or title to say something …. controversial? ….attention getting? ….CNN / Slashdot worthy?

Bring it on!

Submit late breaking proposals!


Constraint-Based XML Query Rewriting for Data Integration

Monday, April 16th, 2012

Constraint-Based XML Query Rewriting for Data Integration by Cong Yu and Lucian Popa.


We study the problem of answering queries through a target schema, given a set of mappings between one or more source schemas and this target schema, and given that the data is at the sources. The schemas can be any combination of relational or XML schemas, and can be independently designed. In addition to the source-to-target mappings, we consider as part of the mapping scenario a set of target constraints specifying additional properties on the target schema. This becomes particularly important when integrating data from multiple data sources with overlapping data and when such constraints can express data merging rules at the target. We define the semantics of query answering in such an integration scenario, and design two novel algorithms, basic query rewrite and query resolution, to implement the semantics. The basic query rewrite algorithm reformulates target queries in terms of the source schemas, based on the mappings. The query resolution algorithm generates additional rewritings that merge related information from multiple sources and assemble a coherent view of the data, by incorporating target constraints. The algorithms are implemented and then evaluated using a comprehensive set of experiments based on both synthetic and real-life data integration scenarios.

Who does this sound like?:

Data merging is notoriously hard for data integration and often not dealt with. Integration of scientific data, however, offers many complex scenarios where data merging is required. For example, proteins (each with a unique protein id) are often stored in multiple biological databases, each of which independently maintains different aspects of the protein data (e.g., structures, biological functions, etc.). When querying on a given protein through a target schema, it is important to merge all its relevant data (e.g., structures from one source, functions from another) given the constraint that protein id identifies all components of the protein.

When target constraints are present, it is not enough to consider only the mappings for query answering. The target instance that a query should “observe” must be defined by the interaction between all the mappings from the sources and all the target constraints. This interaction can be quite complex when schemas and mappings are nested and when the data merging rules can enable each other, possibly, in a recursive way. Hence, one of the first problems that we study in this paper is what it means, in a precise sense, to answer the target queries in the “best” way, given that the target instance is specified, indirectly, via the mappings and the target constraints. The rest of the paper will then address how to compute the correct answers without materializing the full target instance, via two novel algorithms that rewrite the target query into a set of corresponding source queries.

Wrong! 😉

The ACM reports sixty-seven (67) citations of this paper as of today. (Paper published in 2004.) Summaries of any of the citing literature welcome!

The question of data integration persists to this day. I take that to indicate that whatever the merits of this approach, data integration issues remain unsolved.

What are the merits/demerits of this approach?