XML Data Clustering « Another Word For It

Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

February 2, 2016

Balisage 2016, 2–5 August 2016 [XML That Makes A Difference!]

Filed under: Conferences,XLink,XML,XML Data Clustering,XML Schema,XPath,XProc,XQuery,XSLT — Patrick Durusau @ 9:47 pm

Dates:

25 March 2016 — Peer review applications due
22 April 2016 — Paper submissions due
21 May 2016 — Speakers notified
10 June 2016 — Late-breaking News submissions due
16 June 2016 — Late-breaking News speakers notified
8 July 2016 — Final papers due from presenters of peer reviewed papers
8 July 2016 — Short paper or slide summary due from presenters of late-breaking news
1 August 2016 — Pre-conference Symposium
2–5 August 2016 — Balisage: The Markup Conference

From the call:

Balisage is the premier conference on the theory, practice, design, development, and application of markup. We solicit papers on any aspect of markup and its uses; topics include but are not limited to:

Web application development with XML

Informal data models and consensus-based vocabularies

Integration of XML with other technologies (e.g., content management, XSLT, XQuery)

Performance issues in parsing, XML database retrieval, or XSLT processing

Development of angle-bracket-free user interfaces for non-technical users

Semistructured data and full text search

Deployment of XML systems for enterprise data

Web application development with XML

Design and implementation of XML vocabularies

Case studies of the use of XML for publishing, interchange, or archiving

Alternatives to XML

the role(s) of XML in the application lifecycle

the role(s) of vocabularies in XML environments

Full papers should be submitted by the deadline given below. All papers are peer-reviewed — we pride ourselves that you will seldom get a more thorough, skeptical, or helpful review than the one provided by Balisage reviewers.

…

Whether in theory or practice, let’s make Balisage 2016 the one people speak of in hushed tones at future markup and information conferences.

Useful semantics continues to flounder about, cf. Vice-President Biden’s interest in “one cancer research language.” Easy enough to say. How hard could it be?

Documents are commonly thought of and processed as if from BOM to EOF is the definition of a document. Much to our impoverishment.

Silo dissing has gotten popular. What if we could have our silos and eat them too?

Let’s set our sights on a Balisage 2016 where non-technicals come away saying “I want that!”

Have your first drafts done well before the end of February, 2016!

Comments Off

May 29, 2012

Destination: Montreal!

Filed under: Conferences,XML,XML Data Clustering,XML Query Rewriting,XML Schema,XPath,XQuery,XSLT — Patrick Durusau @ 3:04 pm

If you remember the Saturday afternoon sci-fi movies, Destination: …., then you will appreciate the title for this post.

Tommie Usdin and company just posted: Balisage 2012 Call for Late-breaking News, written in torn bodice style:

The peer-reviewed part of the Balisage 2012 program has been scheduled (and will be announced in a few days). A few slots on the Balisage program have been reserved for presentation of “Late-breaking” material.

Proposals for late-breaking slots must be received by June 15, 2012. Selection of late-breaking proposals will be made by the Balisage conference committee, instead of being made in the course of the regular peer-review process.

If you have a presentation that should be part of Balisage, please send a proposal message as plain-text email to info@balisage.net.

In order to be considered for inclusion in the final program, your proposal message must supply the following information:

The name(s) and affiliations of all author(s)/speaker(s)

The email address of the presenter

The title of the presentation

An abstract of 100-150 words, suitable for immediate distribution

Disclosure of when and where, if some part of this material has already been presented or published

An indication as to whether the presenter is comfortable giving a conference presentation and answering questions in English about the material to be presented

Your assurance that all authors are willing and able to sign the Balisage Non-exclusive Publication Agreement (http://www.balisage.net/BalisagePublicationAgreement.pdf) with respect to the proposed presentation

In order to be in serious contention for inclusion in the final program, your proposal should probably be either a) really late-breaking (it happened in the last month or two) or b) a paper, an extended paper proposal, or a very long abstract with references. Late-breaking slots are few and the competition is fiercer than for peer-reviewed papers. The more we know about your proposal, the better we can appreciate the quality of your submission.

Please feel encouraged to provide any other information that could aid the conference committee as it considers your proposal, such as a detailed outline, samples, code, and/or graphics. We expect to receive far more proposals than we can accept, so it’s important that you send enough information to make your proposal convincing and exciting. (This material may be attached to the email message, if appropriate.)

The conference committee reserves the right to make editorial changes in your abstract and/or title for the conference program and publicity. (emphasis added to last sentence)

Read that last sentence again!

“The conference committee reserves the right to make editorial changes in your abstract and/or title for the conference program and publicity.”

The conference committee might change your abstract and/or title to say something …. controversial? ….attention getting? ….CNN / Slashdot worthy?

Bring it on!

Submit late breaking proposals!

Please!

Comments Off

February 25, 2012

XML data clustering: An overview

Filed under: Clustering,Data Clustering,XML,XML Data Clustering — Patrick Durusau @ 7:39 pm

XML data clustering: An overview by Alsayed Algergawy, Marco Mesiti, Richi Nayak, and Gunter Saake.

Abstract:

In the last few years we have observed a proliferation of approaches for clustering XML documents and schemas based on their structure and content. The presence of such a huge amount of approaches is due to the different applications requiring the clustering of XML data. These applications need data in the form of similar contents, tags, paths, structures, and semantics. In this article, we first outline the application contexts in which clustering is useful, then we survey approaches so far proposed relying on the abstract representation of data (instances or schema), on the identified similarity measure, and on the clustering algorithm. In this presentation, we aim to draw a taxonomy in which the current approaches can be classified and compared. We aim at introducing an integrated view that is useful when comparing XML data clustering approaches, when developing a new clustering algorithm, and when implementing an XML clustering component. Finally, the article moves into the description of future trends and research issues that still need to be faced.

I thought this survey article would be of particular interest since it covers the syntax and semantics of XML that contains data.

Not to mention that our old friend, heterogeneous data, isn’t far behind:

Since XML data are engineered by different people, they often have different structural and terminological heterogeneities. The integration of heterogeneous data sources requires many tools for organizing and making their structure and content homogeneous. XML data integration is a complex activity that involves reconciliation at different levels: (1) at schema level, reconciling different representations of the same entity or property, and (2) at instance level, determining if different objects coming from different sources represent the same real-world entity. Moreover, the integration of Web data increases the integration process challenges in terms of heterogeneity of data. Such data come from different resources and it is quite hard to identify the relationship with the business subjects. Therefore, a first step in integrating XML data is to find clusters of the XML data that are similar in semantics and structure [Lee et al. 2002; Viyanon et al. 2008]. This allows system integrators to concentrate on XML data within each cluster. We remark that reconciling similar XML data is an easier task than reconciling XML data that are different in structures and semantics, since the later involves more restructuring. (emphasis added)

Two comments to bear in mind while reading this paper.

First, print our or photocopy Table II on page 35, “Features of XML Clustering Approaches.” It will be a handy reminder/guide as you read the coverage of the various techniques.

Second, on the last page, page 41, note that the article was accepted in October of 2009 but not published until October of 2011. It’s great that the ACM has an abundance of excellent survey articles but a two year delay is publication is unreasonable.

Surveys in rapidly developing fields are of most interest when they are timely. Electronic publication upon final acceptance should be the rule at an organization such as the ACM.

Comments (1)