Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

May 24, 2014

Free MarkLogic Classes

Filed under: MarkLogic,XML — Patrick Durusau @ 3:07 pm

Free MarkLogic Classes

From the webpage:

MarkLogic University offers FREE publicly scheduled instructor led courses! Here’s how it works:

  • Sign up for any public class listed below by paying the Booking Fee
  • Once you have completed the course the Booking Fee will be fully refunded with 7 business days
  • If you register for the class but do not attend you will forfeit your Booking Fee
  • If you have any questions please contact training@marklogic.com

Vendor specific I know but you can’t argue with the pricing scheme. If anything, it should help encourage you to attend and complete the classes.

If you take one (or more) of these courses, please comment or send me a private message. Thanks!

May 20, 2014

<oXygen/> XML Editor 16.0

Filed under: Editor,XML — Patrick Durusau @ 12:51 pm

<oXygen/> XML Editor 16.0

From the post:

<oXygen/> XML Editor 16 increases your productivity for XSLT development with the addition of Quick Fixes and improvements to refactoring actions. Saxon-CE specific extensions are supported and you can apply now XPath queries on multiple files.

If you use Ant to orchestrate build processes then <oXygen/> will support you with a powerful Ant editor featuring validation, content completion, outline view, syntax highlight and search and refactoring actions.

Working with conditional content is a lot easier now as you can set different colors and styles for each condition or focus exclusively on a specific deliverable by hiding all excluded content. You can modify DITA and DocBook tables easily using the new table properties action.

You can customize the style of the <oXygen/> WebHelp output to look exactly as you want using the new WebHelp skin builder.

As usual, the new version includes many component updates and new API functionality.
….

Too many changes and new features to list!

Not cheap and has a learning curve but if you are looking for a top end XML editor, you need look no further.

April 25, 2014

7 First Public Working Drafts of XQuery and XPath 3.1

Filed under: XML,XPath,XQuery,XSLT — Patrick Durusau @ 8:14 pm

7 First Public Working Drafts of XQuery and XPath 3.1

From the post:

Today the XML Query Working Group and the XSLT Working Group have published seven First Public Working Drafts, four of which are jointly developed and three are from the XQuery Working Group.

The joint documents are:

  • XML Path Language (XPath) 3.1. XPath is a powerful expression language that allows the processing of values conforming to the data model defined in the XQuery and XPath Data Model. The main features of XPath 3.1 are maps and arrays.
  • XPath and XQuery Functions and Operators 3.1. This specification defines a library of functions available for use in XPath, XQuery, XSLT and other languages.
  • XQuery and XPath Data Model 3.1. This specification defines the data model on which all operations of XPath 3.1, XQuery 3.1, and XSLT 3.1 operate.
  • XSLT and XQuery Serialization 3.1. This document defines serialization of an instance of the XQuery and XPath Data model Data Model into a sequence of octets, such as into XML, text, HTML, JSON.

The three XML Query Working Group documents are:

  • XQuery 3.1 Requirements and Use Cases, which describes the reasons for producing XQuery 3.1, and gives examples.
  • XQuery 3.1: An XML Query Language. XQuery is a versatile query and application development language, capable of processing the information content of diverse data sources including structured and semi-structured documents, relational databases and tree-bases databases. The XQuery language is designed to support powerful optimizations and pre-compilation leading to very efficient searches over large amounts of data, including over so-called XML-native databases that read and write XML but have an efficient internal storage. The 3.1 version adds support for features such as arrays and maps primarily to facilitate processing of JSON and other structures.
  • XQueryX 3.1, which defines an XML syntax for XQuery 3.1.

Learn more about the XML Activity.

To show you how far behind I am on my reading, I haven’t even ordered Michael Kay‘s XSLT 3.0 and XPath 3.0 book and the W3C is already working on 3.1 for both. 😉

I am hopeful that Michael will duplicate his success with XSLT 2.0 and XPath 2.0. This time though, I am going to get the Kindle edition. 😉

April 10, 2014

The X’s Are In Town

Filed under: HyTime,W3C,XML,XPath,XQuery — Patrick Durusau @ 6:53 pm

XQuery 3.0, XPath 3.0, XQueryX 3.0, XDM 3.0, Serialization 3.0, Functions and Operators 3.0 are now W3C Recommendations

From the post:

The XML Query Working Group published XQuery 3.0: An XML Query Language, along with XQueryX, an XML representation for XQuery, both as W3C Recommendations, as well as the XQuery 3.0 Use Cases and Requirements as final Working Group Notes. XQuery extends the XPath language to provide efficient search and manipulation of information represented as trees from a variety of sources.

The XML Query Working Group and XSLT Working Group also jointly published W3C Recommendations of XML Path Language (XPath) 3.0, a widely-used language for searching and pointing into tree-based structures, together with XQuery and XPath Data Model 3.0 which defines those structures, XPath and XQuery Functions and Operators 3.0 which provides facilities for use in XPath, XQuery, XSLT and a number of other languages, and finally the XSLT and XQuery Serialization 3.0 specification giving a way to turn values and XDM instances into text, HTML or XML.

Read about the XML Activity.

I was wondering what I was going to have to read this coming weekend. 😉

It may just be me but the “…provide efficient search and manipulation of information represented as trees from a variety of sources…” sounds a lot like groves to me.

You?

March 25, 2014

Shadow DOM

Filed under: CSS3,Graphics,HTML,Visualization,XML — Patrick Durusau @ 3:15 pm

Shadow DOM by Steven Wittens.

From the post:

For a while now I’ve been working on MathBox 2. I want to have an environment where you take a bunch of mathematical legos, bind them to data models, draw them, and modify them interactively at scale. Preferably in a web browser.

Unfortunately HTML is crufty, CSS is annoying and the DOM’s unwieldy. Hence we now have libraries like React. It creates its own virtual DOM just to be able to manipulate the real one—the Agile Bureaucracy design pattern.

The more we can avoid the DOM, the better. But why? And can we fix it?
….

One of the better posts on markup that I have read in a very long time.

Also of interest, Steven’s heavy interest in graphics and visualization.

His MathBox project for example.

March 18, 2014

Balisage Papers Due 18 April 2014

Filed under: Conferences,XML,XML Schema,XPath,XQuery,XSLT — Patrick Durusau @ 2:21 pm

Unlike the rolling dates for Obamacare, Balisage Papers are due 18 April 2014. (That’s this year for health care wonks.)

From the website:

Balisage is an annual conference devoted to the theory and practice of descriptive markup and related technologies for structuring and managing information.

Are you interested in open information, reusable documents, and vendor and application independence? Then you need descriptive markup, and Balisage is the conference you should attend. Balisage brings together document architects, librarians, archivists, computer scientists, XML wizards, XSLT and XQuery programmers, implementers of XSLT and XQuery engines and other markup-related software, Topic-Map enthusiasts, semantic-Web evangelists, standards developers, academics, industrial researchers, government and NGO staff, industrial developers, practitioners, consultants, and the world’s greatest concentration of markup theorists. Some participants are busy designing replacements for XML while other still use SGML (and know why they do). Discussion is open, candid, and unashamedly technical. Content-free marketing spiels are unwelcome and ineffective.

I can summarize that for you:

There are conferences on the latest IT buzz.

There are conferences on last year’s IT buzz.

Then there are conferences on information as power, which decides who will sup and who will serve.

Balisage is about information as power. How you use it, well, that’s up to you.

March 15, 2014

Saxon/C

Filed under: Saxon,XML,XQuery,XSLT — Patrick Durusau @ 9:41 pm

Saxon/C by Michael Kay.

From the webpage:

Saxon/C is an alpha release of Saxon-HE on the C/C++ programming platform. APIs are offered currently to run XSLT 2.0 and XQuery 1.0 from C/C++ or PHP applications.

BTW, Micheal’s Why XSLT and XQuery? page reports that Saxon is passing more than 25,000 tests for XQuery 3.0.

If you are unfamiliar with either Saxon or Michael Kay, you need to change to a department that is using XML.

March 1, 2014

Legislative XML Data Mapping Results

Filed under: Government,Law - Sources,XML — Patrick Durusau @ 7:57 pm

Legislative XML Data Mapping Results

You may recall last September (2013) when I posted: Legislative XML Data Mapping [$10K], which was a challenge to convert documents encoded in U.S. Congress and U.K. Parliament markup into Akoma Ntoso.

There were five (5) entries and two (2) winners.

The first place winner reports:

The included web application, an instance of which is running at akoma-ntoso.appspot.com, converts documents to Akoma Ntoso in response to common HTTP requests. Visit the app with a web browser, enter the URL of the source XML into the form, and the app responds with an Akoma Ntoso representation of the source document. Requests can even be made without a browser by passing the source document’s URL directly as the “source” parameter, e.g.,

But I was unable to find the files with the includes .xsl transforms.

The second place winner reports the use of Perl scripts that can be found at: http://ec2-50-112-47-161.us-west-2.compute.amazonaws.com/xml-akoma-ntoso/XML-AkomaNtoso-0.1.tar.gz

I was unable to find any formal comparison of the entries. Perhaps you will have better luck.

And I am curious, if you encountered a “converted” form of a U.S. or U.K. statute, would you be able to faithfully reconstruct the original?

February 10, 2014

Free MarkLogic Courses?

Filed under: MarkLogic,XML — Patrick Durusau @ 3:02 pm

MarkLogic Announces Free NoSQL Database Training Courses

From the post:

MarkLogic Corporation, the leading Enterprise NoSQL database platform company, today announced the schedule for its MarkLogic University public courses with hands-on instruction to attending users and developers free of charge. The courses are led by an instructor in various live, online, and classroom locations, and provide MarkLogic customers and developers with the training to optimize their NoSQL database deployments and the education to develop applications on the MarkLogic database.

Since 2001, MarkLogic has focused on providing a powerful and trusted Enterprise NoSQL database platform that empowers organizations to turn all data into valuable and actionable information. The MarkLogic University program was created to give customers the access to best practices for managing vast amounts of diverse data. Now project managers, architects, developers, testers, and administrators can improve their MarkLogic skills with no cost training.

“The demand for MarkLogic development and administration skills is increasing in the market and with a sharp focus on customer success, we are dedicated to providing easy access to information and education that will assist developers and IT professionals to better manage and do more with their data,” said Jon Bakke, senior vice president, global technical services, MarkLogic. “By making MarkLogic training resources widely available, we are helping to build up much-needed technical skills that enterprises need to derive value from the vast amounts of enterprise data that is being created and stored today.”

But, when I visit: http://www.marklogic.com/services/training/class-schedule/

I see refundable booking fees. (As of 10 February 2013 at 15:00 EST.)

Nor could I find a statement by MarkLogic on its blog or pressroom confirming free classes.

I have seen this at several sources and suggest further inquiry before anyone gets too excited.

February 9, 2014

Generating an XML test corpus

Filed under: MarkLogic,XML — Patrick Durusau @ 5:58 pm

Generating an XML test corpus by Anthony Coates.

From the post:

My current role requires me to work with the MarkLogic NoSQL database. I’ve had some experience with it in the past, if not as much as I would have liked to have had.

Compared to relational databases, “document databases” like MarkLogic have the advantage that content is stored in a denormalised “document” format. If you have your data denormalised appropriately into documents, such that each query requires only a single document, then the database gives its optimum performance. With relational databases, there’s generally no way to avoid having some joins in queries, even if some of the data is denormalised into tables.
….

Anthony is an old hand with XML and has started a new blog.

I am particularly interested in Anthony’s questions about linking documents, denormalizing data, to say nothing of generating the test corpus.

I signed up for the RSS feed but don’t depend on me to mention every post. 😉

January 31, 2014

DOCX -> HTML/CSS

Filed under: Conversion,Microsoft,XML — Patrick Durusau @ 2:04 pm

Transform DOCX to HTML/CSS with High-Fidelity using PowerTools for Open XML by Eric White.

From the post:

Today I am happy to announce the release of HtmlConverter version 2.06.00, which is a high fidelity conversion from DOCX to HTML/CSS. HtmlConverter is a module in the PowerTools for Open XML project.

….
HtmlConverter.cs 2.06.00 supports:

  • Paragraph styles, character styles, and table styles, including styles that are based on other styles.
  • Table styles includes support for conditional table style options (header row, total row, banded rows, first column, last column, and banded columns.
  • Fonts, including font styles such as bold, italic, underline, strikethrough, foreground and background colors, shading, sub-script, super-script, and more.  HtmlConverter is, in effect, guidance on how to correctly determine the font and formatting for each paragraph and text run in a document.
  • Numbered and bulleted lists.  Current support is only for en-US and fr-FR; however, HtmlConverter is factored and parameterized so that you can support other languages without altering the source code.  In the near future, I’ll be publishing guidance and instructions on how to support additional languages, and I’ll be asking for volunteers to write and contribute the bits of code to generate canonical (one, two, three) and ordinal (first, second, third) implementations for your native language, as well as the various Asian and RTL numbering systems.
  • Tabs, including left tabs, right tabs, centered tabs, and decimal tabs.  HtmlConverter takes the approach of using font metrics to calculate the exact width of the various pieces of text in a line, and inserts <span> elements with precisely calculated widths.
  • High fidelity support for vertical white space and horizontal white space, including indented text, hanging indents, centered text, right justified text, and justified text.
  • Borders around paragraphs, and high fidelity for borders of tables.
  • Horizontally and vertically merged cells in tables.
  • External hyperlinks, and internal hyperlinks to bookmarks within the document.
  • You have much more control over the conversion when compared to other approaches to converting to HTML.  There are already a number of parameters that enable you to control the transformation, and in the future I’ll be adding many more knobs and levers to fine tune the conversion.  And of course, you have the source code, so you can customize the conversion for your scenario.

See Eric’s post for questions about what priority desired features should have for addition to HtmlConverter.

BTW:

PowerTools for Open XML is licensed under the Microsoft Public License (Ms-PL), which gives you wide latitude in how you use the code, including its use in commercial products and open source projects.

It won’t be long until “not open source” software will be worthy of comment.

I first saw this in a tweet by Open Microsoft.

January 30, 2014

XQueryX 3.0 Proposed Recommendation Published

Filed under: XML,XQuery — Patrick Durusau @ 10:27 am

XQueryX 3.0 Proposed Recommendation Published

From the post:

The XML Query Working Group has published a Proposed Recommendation of XQueryX 3.0. XQueryX is an XML representation of an XQuery. It was created by mapping the productions of the XQuery grammar into XML productions. The result is not particularly convenient for humans to read and write, but it is easy for programs to parse, and because XQueryX is represented in XML, standard XML tools can be used to create, interpret, or modify queries. Comments are welcome through 25 February 2014.

Be mindful of the 25 February 2014 deadline for comments and enjoy!

January 14, 2014

Balisage 2014: Near the Belly of the Beast

Filed under: Conferences,HyTime,XML,XML Schema,XPath,XQuery,XSLT — Patrick Durusau @ 7:29 pm

Balisage: The Markup Conference 2014 Bethesda North Marriott Hotel & Conference Center, just outside Washington, DC

Key dates:
– 28 March 2014 — Peer review applications due
– 18 April 2014 — Paper submissions due
– 18 April 2014 — Applications for student support awards due
– 20 May 2014 — Speakers notified
– 11 July 2014 — Final papers due
– 4 August 2014 — Pre-conference Symposium
– 5–8 August 2014 — Balisage: The Markup Conference

From the call for participation:

Balisage is the premier conference on the theory, practice, design, development, and application of markup. We solicit papers on any aspect of markup and its uses; topics include but are not limited to:

  • Cutting-edge applications of XML and related technologies
  • Integration of XML with other technologies (e.g., content management, XSLT, XQuery)
  • Performance issues in parsing, XML database retrieval, or XSLT processing
  • Development of angle-bracket-free user interfaces for non-technical users
  • Deployment of XML systems for enterprise data
  • Design and implementation of XML vocabularies
  • Case studies of the use of XML for publishing, interchange, or archving
  • Alternatives to XML
  • Expressive power and application adequacy of XSD, Relax NG, DTDs, Schematron, and other schema languages

Detailed Call for Participation: http://balisage.net/Call4Participation.html
About Balisage: http://balisage.net/Call4Participation.html
Instructions for authors: http://balisage.net/authorinstructions.html

For more information: info@balisage.net or +1 301 315 9631

I checked, from the conference hotel you are anywhere from 25.6 to 27.9 miles by car from the NSA Visitor Center at Fort Meade.

Take appropriate security measures.

When I heard Balisage was going to be in Bethesda, the first song that came to mind was: Back in the U.S.S.R.. Followed quickly by Leonard Cohen’s Democracy Is Coming to the U.S.A..

I don’t know where the equivalent of St. Catherine Street of Montreal is in Bethesda. But when I find out, you will be the first to know!

Balisage is simply the best markup technology conference. (full stop) Start working on your manager now to get time to write a paper and to attend Balisage.

When the time comes for “big data” to make sense, markup will be there to answer the call. You should be too.

January 3, 2014

xslt3testbed

Filed under: XML,XPath,XSLT — Patrick Durusau @ 5:30 pm

xslt3testbed

From the post:

Testbed for trying out XSLT 3.0 (http://www.w3.org/TR/xslt-30/) techniques.

Since few people yet have much (or any) experience using XSLT 3.0 on more than toy examples, this is a public, medium-sized XSLT 3.0 project where people could try out new XSLT 3.0 features on the transformations to (X)HTML(5) and XSL-FO that are what we do most often and, along the way, maybe come up with new design patterns for doing transformations using the higher-order functions, partial function application, and other goodies that XSLT 3.0 gives us.

If you haven’t been investigating XSLT 3.0 (and related specifications) you need to take corrective action.

As an incentive, read Pearls Of XSLT And XPath 3.0 Design.

If you thought XSLT was useful for data operations, you will be amazed by XSLT 3.0!

November 14, 2013

Querying rich text with Lux

Filed under: Lucene,Query Language,XML,XQuery — Patrick Durusau @ 11:17 am

Querying rich text with Lux – XQuery for Lucene by Michael Sokolov.

Slide deck that highlights features of Lux, which is billed at its homepage as:

The XML Search Engine Lux is an open source XML search engine formed by fusing two excellent technologies: the Apache Lucene/Solr search index and the Saxon XQuery/XSLT processor.

Not surprisingly, I am in favor of using XML to provide context for data.

You can get a better feel for Lux by:

Reading Indexing Queries in Lux by Michael Sokolov (Balisage 2013)

Visiting the Lux homepage: http://luxdb.org

Downloading Lux Source: http://github.com/msokolov/lux

BTW, Michael does have experience with XML based content: safaribooksonline.com, oed.com, degruyter.com, oxfordreference.com and others.

PS: Remember any comments on XQuery 3.0 are due by November 19, 2013.

October 22, 2013

X* 3.0 Proposed Recommendations

Filed under: XML,XPath,XQuery,XSLT — Patrick Durusau @ 8:01 pm

XQuery 3.0, XPath 3.0, Data Model, Functions and Operators and XSLT and XQuery Serialization 3.0

From the post:

The XML Query Working Group and the XSLT Working Group have published five Proposed Recommendations today:

Comments are welcome through 19 November. Learn more about the Extensible Markup Language (XML) Activity.

What’s today? October 22nd?

You almost have 30 days. 😉

Which one or more are you going to read?

I first saw this in a tweet by Jonathan Robie.

September 24, 2013

Rumors of Legends (the TMRM kind?)

Filed under: Bioinformatics,Biomedical,Legends,Semantics,TMRM,XML — Patrick Durusau @ 3:42 pm

BioC: a minimalist approach to interoperability for biomedical text processing (numerous authors, see the article).

Abstract:

A vast amount of scientific information is encoded in natural language text, and the quantity of such text has become so great that it is no longer economically feasible to have a human as the first step in the search process. Natural language processing and text mining tools have become essential to facilitate the search for and extraction of information from text. This has led to vigorous research efforts to create useful tools and to create humanly labeled text corpora, which can be used to improve such tools. To encourage combining these efforts into larger, more powerful and more capable systems, a common interchange format to represent, store and exchange the data in a simple manner between different language processing systems and text mining tools is highly desirable. Here we propose a simple extensible mark-up language format to share text documents and annotations. The proposed annotation approach allows a large number of different annotations to be represented including sentences, tokens, parts of speech, named entities such as genes or diseases and relationships between named entities. In addition, we provide simple code to hold this data, read it from and write it back to extensible mark-up language files and perform some sample processing. We also describe completed as well as ongoing work to apply the approach in several directions. Code and data are available at http://bioc.sourceforge.net/.

From the introduction:

With the proliferation of natural language text, text mining has emerged as an important research area. As a result many researchers are developing natural language processing (NLP) and information retrieval tools for text mining purposes. However, while the capabilities and the quality of tools continue to grow, it remains challenging to combine these into more complex systems. Every new generation of researchers creates their own software specific to their research, their environment and the format of the data they study; possibly due to the fact that this is the path requiring the least labor. However, with every new cycle restarting in this manner, the sophistication of systems that can be developed is limited. (emphasis added)

That is the experience with creating electronic versions of the Hebrew Bible. Every project has started from a blank screen, requiring re-proofing of the same text, etc. As a result, there is no electronic encoding of the masora magna (think long margin notes). Duplicated effort has a real cost to scholarship.

The authors stray into legend land when they write:

Our approach to these problems is what we would like to call a ‘minimalist’ approach. How ‘little’ can one do to obtain interoperability? We provide an extensible mark-up language (XML) document type definition (DTD) defining ways in which a document can contain text, annotations and relations. Major XML elements may contain ‘infon’ elements, which store key-value pairs with any desired semantic information. We have adapted the term ‘infon’ from the writings of Devlin (1), where it is given the sense of a discrete item of information. An associated ‘key’ file is necessary to define the semantics that appear in tags such as the infon elements. Key files are simple text files where the developer defines the semantics associated with the data. Different corpora or annotation sets sharing the same semantics may reuse an existing key file, thus representing an accepted standard for a particular data type. In addition, key files may describe a new kind of data not seen before. At this point we prescribe no semantic standards. BioC users are encouraged to create their own key files to represent their BioC data collections. In time, we believe, the most useful key files will develop a life of their own, thus providing emerging standards that are naturally adopted by the community.

The “key files” don’t specify subject identities for the purposes of merging. But defining the semantics of data is a first step in that direction.

I like the idea of popular “key files” (read legends) taking on a life of their own due to their usefulness. An economic activity based on reducing the friction in using or re-using data. That should have legs.

BTW, don’t overlook the author’s data and code, available at: http://bioc.sourceforge.net/.

September 2, 2013

W3C Cheatsheet

Filed under: W3C,XLink,XML — Patrick Durusau @ 7:07 pm

W3C Cheatsheet

You can see the cheatsheet in action or look at the developer documentation.

Interesting resource but needs wider coverage.

Do you recall a Windows executable that was an index of all the XML standards? I remember it quite distinctly but haven’t seen it in years now. Freeware product with updates.

I will look on old external drives and laptops to see if I have a copy.

It would be very useful to have a complete index to W3C work with scoping by version and default to the latest “official” release.

August 7, 2013

BaseX 7.7 has been released!

Filed under: BaseX,XML,XPath,XQuery — Patrick Durusau @ 6:27 pm

BaseX 7.7 has been released!

From the webpage:

BaseX is a very light-weight, high-performance and scalable XML Database engine and XPath/XQuery 3.0 Processor, including full support for the W3C Update and Full Text extensions. An interactive and user-friendly GUI frontend gives you great insight into your XML documents.

To maximize your productivity and workflows, we offer professional support, highly customized software solutions and individual trainings on XML, XQuery and BaseX. Our product itself is completely Open Source (BSD-licensed) and platform independent; join our mailing lists to get regular updates!

But most important: BaseX runs out of the box and is easy to use…

This was a fortunate find. I have some XML work coming up and need to look at the latest offerings.

July 23, 2013

XML Calabash

Filed under: XML,XML Schema,XProc — Patrick Durusau @ 2:02 pm

XML Calabash

From the webpage:

XML Calabash is an implementation of XProc: An XML Pipeline Language.

See the XML Calabash project status page for more details.

You can download Calabash and/or read the (very little bit of) documentation. Calabash also ships with the xml editor (as does Saxon-EE which includes support for validation with W3C XML Schema).

A new release of Calabash reminded me that I needed to update some of my XML tooling.

If you are looking for an opportunity to write documentation, this could be your lucky day! 😉

July 9, 2013

Graph-based Approach to Automatic Taxonomy Generation (GraBTax)

Filed under: Graphs,Taxonomy,Topic Maps,XML — Patrick Durusau @ 7:43 pm

Graph-based Approach to Automatic Taxonomy Generation (GraBTax) by Pucktada Treeratpituk, Madian Khabsa, C. Lee Giles.

Abstract:

We propose a novel graph-based approach for constructing concept hierarchy from a large text corpus. Our algorithm, GraBTax, incorporates both statistical co-occurrences and lexical similarity in optimizing the structure of the taxonomy. To automatically generate topic-dependent taxonomies from a large text corpus, GraBTax first extracts topical terms and their relationships from the corpus. The algorithm then constructs a weighted graph representing topics and their associations. A graph partitioning algorithm is then used to recursively partition the topic graph into a taxonomy. For evaluation, we apply GraBTax to articles, primarily computer science, in the CiteSeerX digital library and search engine. The quality of the resulting concept hierarchy is assessed by both human judges and comparison with Wikipedia categories.

Interesting work.

For example:

Unfortunately, existing taxonomies for concepts in computer science such as ODP categories and the ACM Classification System1 are unsuitable as a gold standard. ODP categories are too broad and do not contain the majority of concepts produced by our algorithm. For instance, there are no sub-concepts for “Semantic Web” in ODP. Also some portions of ODP categories under computer science are not computer science related concepts, especially at the lower level. For example, the concepts under “Neural Networks” are Books, People, Companies, Publications, FAQs, Help and Tutorials, etc. The ACM Classification System has similar drawbacks, where its categories are too broad for comparison.

Makes me curious if comparing the topics extracted from articles would consistently map to the broad categories assigned by the ACM.

Also instructive for the use of graphs, which admit to no pre-determined data structure.

I say that because of an on-going discussion about alternative data models for topic maps.

As you know, I don’t think topic maps have only one data model, not even my own.

The model you construct with your topic map should meet your needs, not mine.

Graphs are a good example of interchangeable information artifacts despite no one being able to constrain the graphs of others.

XML is another, although it gets overlooked from time to time.

PS: The authors don’t say but I am assuming that ODP = Open Directory Project.

June 24, 2013

Balisage 2013 Program Finalized

Filed under: Conferences,XML — Patrick Durusau @ 3:32 pm

It seems to happen every year when Balisage finalizes its program.

There is a burst of not very interesting or important stories that drive the Balisage final program off the home page at CNN.com.

Instead, you can read about an old lecher, the nine ride again, and who wants to go to Ecuador?

Why any of that would kick the final Balisage program off CNN’s homepage, I can’t say.

What I can say is how excellent the late additions to the program appear:

Topics added include:
  • The new W3C publishing activity
  • Marking up Changes in XML Documents
  • Comparing Document Grammars using XQuery
  • User interface styles for a web interface design framework
  • Rights metadata standards
  • A general purpose architecture for making slides from XML documents
  • Architectural forms for the 21st century

I particularly want to hear about “Architectural forms for the 21st century!”

Details:

Schedule At A Glance: http://www.balisage.net/2013/At-A-Glance.html
Detailed program: http://www.balisage.net/2013/Program.html

June 18, 2013

Lux (0.9 – New Release)

Filed under: Indexing,Lux,XML,XQuery — Patrick Durusau @ 12:46 pm

Lux – The XML Search Engine

From the webpage:

Lux is an open source XML search engine formed by fusing two excellent technologies: the Apache Lucene/Solr search index and the Saxon XQuery/XSLT processor.

Release notes for 0.9 (released today)

This looks quite promising!

May 28, 2013

Four and Twenty < / > ! Baked in a Pie…

Filed under: Conferences,XML,XML Database,XML Query Rewriting,XML Schema,XQuery,XSLT — Patrick Durusau @ 2:53 pm

Balisage 2013 program is online!

From Tommie Usdin’s email:

Balisage is an annual conference devoted to the theory and practice of descriptive markup and related technologies for structuring and managing information. Participants typically include XML users, librarians, archivists, computer scientists, XSLT and XQuery programmers, implementers of XSLT and XQuery engines and other markup-related software, Topic-Map enthusiasts, semantic-Web evangelists, members of the working groups which define the specifications, academics, industrial researchers, representatives of governmental bodies and NGOs, industrial developers, practitioners, consultants, and the world’s greatest concentration of markup theorists. Discussion is open, candid, and unashamedly technical.

Major features of this year’s program include several challenges to the fundamental infrastructure of XML; case studies from government, academia, and publishing; approaches to overlapping data structures; discussions of XML’s political fortunes; and technical papers on XML, XForms, XQuery, REST, XSLT, RDF, XSL-FO, XSD, the DOM, JSON, and XPath.

Attending Balisage even once will keep you from repeating mistakes in language design.

Attending Balisage twice will mark you as a markup expert.

Attending Balisage three or more times, well, this is an open channel so we can’t go there.

But you should go to Balisage!

Send your pics from Saint Catherine Street!

April 12, 2013

…Apache HBase REST Interface, Part 2

Filed under: HBase,JSON,XML — Patrick Durusau @ 4:29 pm

How-to: Use the Apache HBase REST Interface, Part 2 by Jesse Anderson.

From the post:

This how-to is the second in a series that explores the use of the Apache HBase REST interface. Part 1 covered HBase REST fundamentals, some Python caveats, and table administration. Part 2 below will show you how to insert multiple rows at once using XML and JSON. The full code samples can be found on GitHub.

Only fair to cover both XML and TBL’s new favorite, JSON. (Tim Berners-Lee Renounces XML?)

April 10, 2013

Tim Berners-Lee Renounces XML?

Filed under: JSON,XML — Patrick Durusau @ 2:06 pm

Draft TAG Teleconference Minutes 4th of April 2013

In a discussion of ISSUE-34: XML Transformation and composability (e.g., XSLT,XInclude, Encryption) the following exchange takes place:

Noah: Lets go through the issues and see which we can close. … Processing model of XML. Is there any interest in this?

xmlFunctions-34

Tim: I’m happy to do things with XML. This came from when we’re talking about XML was processed. The meaning from XML has to be taken outside-in. Otherwise you cannot create new XML specifications that interweave with what exist. … Not clear people noticed that.

I note that traceker has several status codes we can assign, including OPEN, PENDING, REVIEW, POSTPONED, and CLOSED.

Tim: Henry did a lot more work on that. I don’t feel we need to put a whole lot of energy into XML at all. JSON is the new way for me. It’s much more straightforward.

Suggestion: if we think this is now resolved or uninteresting, CLOSE it; if we think it’s interesting but not now, then POSTPONED?

Tim: We need another concept besides OPEN/CLOSED. Something like NOT WORKING ON IT.

Noah: It has POSTPONED.

Tim: POSTPONED expresses a feeling of guilt. But there’s no guilt.

Noah: It’s close enough and I’m not looking forward to changing Tracker.

ht, you wanted to add 0.02USD

Henry: I’m happy to move this to the backburner. I think there’s a genuine issue here and of interest to the community but I don’t have the bandwidth.

Noah: We need to tell ourselves a story as to what these codes mean. … Historically we used CLOSED for “it’s in pretty good shape”.

Henry: I’m happy with POSTPONED and it’s better than CLOSED.

+1 for postponing

+1

RESOLUTION: We mark ISSUE-34 (xmlFunctions-34) POSTPONED

I think this is important, thanks for doing it noah

(emphasis added)

XML can be improved to be sure but the concept is not inherently flawed.

To JSON supporters, all I can say is XML wasn’t the bloated confusion you see now when it started.

April 7, 2013

RSSOwl and Feed Validation

Filed under: RSS,XML — Patrick Durusau @ 6:17 pm

I rather hate to end the day on a practical note, ;-), but after going off Google Reader, I started using RSSOwl.

I have been adding feeds to RSSOwl but there were two that simply refused to load.

Feed Validator reported the feed was:

not well-formed (invalid token)

with a pointer to the letter “f” in the word “find.”

Helpful but not a bunch.

Captured the feed as XML and loaded it into oXygen.

A form feed character was immediately in front of the “f” in “fine” but of course was not displaying.

Culprit in one case was a form feed character, 0xc and in the other, end of text, 0x03.

ASCII characters 0 — 31 and 127 are non-printing control characters called CO controls.

Of the CO control characters, only carriage return (0x0d), linefeed (0x0a) and horizontal tab (0x09) can appear in an XML feed.

For loading and parsing RSS feeds into a topic map, you may want to filter for CO controls that should not appear in the XML feed.

PS: I suspect in both cases the control characters were introduced by copy-n-paste operations.

March 17, 2013

XMLQuire Web Edition

Filed under: HTML5,Saxon,XML,XSLT — Patrick Durusau @ 5:50 am

XMLQuire Web Edition: A Free XSLT 2.0 Editor for the Web

From the webpage:

XSLT 2.0 processing within the browser is now a reality with the introduction of the open source Saxon-CE from Saxonica. This processor runs as a JavaScript app and supports JavaScript interoperability and user-event handling for the era of HTML5 and the dynamic web.

This Windows product, XMLQuire, is an XSLT edtior specially extended to integrate with Saxon-CE and support the Saxon-CE language extensions that make interactive XSLT possible. Saxon-CE is not included with this product, but is available from Saxonica here.

*nix folks will have to install Windows 7 or 8 on a VM to take advantage of this software.

Worth the effort if for no other reason than to see how the market majority lives. 😉

I first saw this in a tweet by Michael Kay.

February 19, 2013

“…XML User Interfaces” As in Using XML?

Filed under: Conferences,Interface Research/Design,XML,XML Schema,XPath,XQuery,XSLT — Patrick Durusau @ 1:00 pm

International Symposium on Native XML user interfaces

This came across the wire this morning and I need your help interpreting it.

Why would you want to have an interface to XML?

All these years I have been writing XML in Emacs because XML wasn’t supposed to have an interface.

Brave hearts, male, female and unknown, struggling with issues too obscure for mere mortals.

Now I find that isn’t supposed to be so? You can imagine my reaction.

I moved my laptop a bit closer to the peat fire to make sure I read it properly. Waiting for the ox cart later this week to take my complaint to the local bishop about this disturbing innovation.

😉

15 March 2013 — Peer review applications due
19 April 2013 — Paper submissions due
19 April 2013 — Applications due for student support awards due
21 May 2013 — Speakers notified
12 July 2013 — Final papers due
5 August 2013 — International Symposium on Native XML user interfaces
6–9 August 2013 — Balisage: The Markup Conference

International Symposium on
Native XML user interfaces

Monday August 5, 2013 Hotel Europa, Montréal, Canada

XML is everywhere. It is created, gathered, manipulated, queried, browsed, read, and modified. XML systems need user interfaces to do all of these things. How can we make user interfaces for XML that are powerful, simple to use, quick to develop, and easy to maintain?

How are we building user interfaces today? How can we build them tomorrow? Are we using XML to drive our user interfaces? How?

This one-day symposium is devoted to the theory and practice of user interfaces for XML: the current state of implementations, practical case studies, challenges for users, and the outlook for the future development of the technology.

Relevant topics include:

  • Editors customized for specific purposes or users
  • User interfaces for creation, management, and use of XML documents
  • Uses of XForms
  • Making tools for creation of XML textual documents
  • Using general-purpose user-interface libraries to build XML interfaces
  • Looking at XML, especially looking at masses of XML documents
  • XML, XSLT, and XQuery in the browser
  • Specialized user interfaces for specialized tasks
  • XML vocabularies for user-interface specification

Presentations can take a variety of forms, including technical papers, case studies, and tool demonstrations (technical overviews, not product pitches).

This is the same conference I wrote about in: Markup Olympics (Balisage) [No Drug Testing].

In times of lean funding for conferences, if you go to a conference this year, it really should be Balisage.

You will be the envy of your co-workers and have tales to tell your grandchildren.

Not bad for one conference registration fee.

February 13, 2013

MarkLogic Announces Free Developer License for Enterprise [With Odd Condition]

Filed under: MarkLogic,NoSQL,XML — Patrick Durusau @ 5:46 am

MarkLogic Announces Free Developer License for Enterprise

From the post:

MarkLogic Corporation today announced the availability of a free Developer License for MarkLogic Enterprise Edition.

The Developer License provides access to the features available in MarkLogic Enterprise Edition, including integrated search, government-grade security, clustering, replication, failover, alerting, geospatial indexing, conversion, and a suite of application development tools. MarkLogic also announced the Mongo2MarkLogic converter, a Java-based tool for importing data from MongoDB into MarkLogic providing developers immediate access to features needed to build out enterprise-ready big data solutions.

“By providing a free Developer License we enable developers to quickly deliver reliable, scalable and secure information and analytic applications that are production-ready,” said Gary Bloom, CEO and President of MarkLogic. “Many of our customers first experimented with other free NoSQL products, but turned to MarkLogic when they recognized the need for search, security, support for ACID transactions and other features necessary for enterprise environments. Our goal is to eliminate the cost barrier for developers and give them access to the best enterprise NoSQL platform from the start.”

The Developer License for MarkLogic Enterprise Edition includes tools for faster application development, business intelligence (BI) tool integration, analytic functions and visualization tools, and the ability to create user-defined functions for fast and flexible analysis of huge volumes of data.

You would think that story would merit at least one link to the free developer program.

For your convenience: Developer License for Enterprise Edition. BTW, MarkLogic homepage.

That wasn’t hard. Two links and you have direct access to the topic of the story and the company.

One odd licensing condition:

Q. Can I publish my work done with MarkLogic Server?

A. We encourage you to share your work publicly, but note that you can not disclose, without MarkLogic prior written consent, any performance or capacity statistics or the results of any benchmark test performed on MarkLogic Server.

That sounds just a tad defensive doesn’t it?

I haven’t looked at MarkLogic for a couple of iterations but earlier versions had no need to fear statistics or benchmark tests.

Results vary depending on how testing is done but anyone authorized to recommend or sign acquisition orders should know that.

If they don’t, your organization has more serious problems than needing a MarkLogic server.

« Newer PostsOlder Posts »

Powered by WordPress