Archive for the ‘XQuery’ Category

eXist-db v3.6.0 [Prediction for 2018: Multiple data/document leak tsunamis. Are You Ready?]

Monday, November 27th, 2017

eXist-db v3.6.0

From the post:

Features

  • Switched Collation support to use ICU4j.
  • Implemented XQuery 3.1 UCA (Unicode Collation Algorithm).
  • Implemented map type parameters for XQuery F&O 3.1 fn:serialize.
  • Implemented declare context item for XQuery 3.0.
  • Implemented XQuery 3.0 Regular Expression’s support for non-capturing groups.
  • Implemented a type-safe DSL for describing and testing transactional operations upon the database.
  • Implemented missing node kind tests in the XQuery parser when using @ on an AbbrevForwardStep.
  • Added AspectJ support to the IntelliJ project files (IntelliJ Ultimate only).
  • Repaired the dependencies in the NetBeans project files.
  • Added support for Travis macOS CI.
  • Added support for AppVeyor Windows CI.
  • Updated third-party dependencies:
    • Apache Commons Codec 1.11
    • Apache Commons Compress 1.15
    • Apache Commons Lang 3.7
    • Eclipse AspectJ 1.9.0.RC1
    • Eclipse Jetty 9.4.7.v20170914
    • EXPath HTTP Client 20171116
    • Java 8 Functional Utilities 1.11
    • JCTools 2.1.1
    • XML Unit 2.4.0

Performance Improvements

  • Compiled XQuery cache is now multi-threaded; concurrency is now per-source.
  • RESTXQ compiled XQuery cache is now multi-threaded; concurrency is now per-query URI.
  • STX Templates Cache is now multithreaded.
  • XML-RPC Server will now use Streaming and GZip compression if supported by the client; enabled in eXist’s Java Admin Client.
  • Reduced object creation overhead in the XML-RPC Server.

Apps

The bundled applications of the Documentation, eXide, and Monex have all been updated to the latest versions.

Prediction for 2018: Multiple data/document leak tsunamis.

Are you prepared?

How are your XQuery skills and tools?

Or do you plan on regurgitating news wire summaries?

XML Prague 2017 – 21 Reasons to Attend 2018 – Offensive Use of XQuery

Monday, November 13th, 2017

XML Prague 2017 Videos

Need reasons for your attending XML Prague 2018?

The XML Prague 2017 YouTube playlist has twenty-one (21) very good reasons (videos). (You may have to hold the hands of c-suite types if you share the videos with them.)

Two things that I see missing from the presentations, security and offensive use of XQuery.

XML Security

You may have noticed that corporations, governments and others have been hemorrhaging data in 2017 (and before). While legislators wail ineffectually and wish for a 18th century world, the outlook for cybersecurity looks grim for 2018.

XML and XML applications exist in a law of the jungle security context. But there weren’t any presentations on security related issues at XML Prague in 2017. Are you going to break the ice in 2018?

Offensive use of XQuery

XQuery has the power to extract, enhance and transform data to serve your interests, not those of its authors.

I’ve heard the gospel that technologists should disarm themselves and righteously await a better day. Meanwhile, governments, military forces, banks, and their allies loot and rape the Earth and its peoples.

Are data scientists at the NSA, FSB, MSS, MI6, Mossad, CIA, etc., constrained by your “do no evil” creeds?

Present governments or their successors, can move towards more planet and people friendly policies, but they require, ahem, encouragement.

XQuery, which doesn’t depend upon melting data centers, supercomputers, global data vacuuming, etc., can help supply that encouragement.

How would you use XQuery to transform government data to turn it against its originator?

eXist-db Docker Image Builder

Saturday, November 11th, 2017

eXist-db Docker Image Builder

From the webpage:

Pre-built eXist-db Docker images have been published on Docker Hub. You can skip to Running an eXist-db Docker Image if you just want to use the provided Docker images.

To ease your use of eXist-db or create a customized distribution of eXist-db, complete with additional resources, this rocks.

XPath and XQuery Assertions in SoapUI

Friday, November 3rd, 2017

The video, XPath and XQuery assertions in SoapUI in depth, drew my attention to SoapUI, but be forewarned the sound quality was so bad I could not follow it. Still, I can now mention SoapUI and that’s not a bad thing.

The SoapUI documentation has extended examples for Validating XML Messages, Getting started with Assertions, and Transferring Property Values.

SoapUI has the usual hand-waving about security but since critical airport security plans can be found USB litter, I’m not sure anyone bothers. Your Amazon account root password is probably on a sticky note on someone’s monitor. Go check.

XML Prague 2018 – Apology to Procrastinators

Thursday, October 12th, 2017

Apology to all procrastinators, I just saw the Call for Proposals for XML Prague 2018

You only have 50 days (until November 30, 2017) to submit your proposals for XML Prague 2018.

Efficient people don’t realize that 50 days is hardly enough time to put off thinking about a proposal topic, much less fail to write down anything for a proposal. Completely unreasonable demand but, do try to procrastinate quickly and get a proposal done for XML Prague 2018.

The suggestion of doing a “…short video…” seems rife with potential for humor and/or NSFW images. Perhaps XML Prague will post the best “…short videos…” to YouTube?

From the webpage:

XML Prague 2018 now welcomes submissions for presentations on the following topics:

  • Markup and the Extensible Web – HTML5, XHTML, Web Components, JSON and XML sharing the common space
  • Semantic visions and the reality – micro-formats, semantic data in business, linked data
  • Publishing for the 21th century – publishing toolchains, eBooks, EPUB, DITA, DocBook, CSS for print, …
  • XML databases and Big Data – XML storage, indexing, query languages, …
  • State of the XML Union – updates on specs, the XML community news, …
  • XML success stories – real-world use cases of successful XML deployments

There are several different types of slots available during the conference and you can indicate your preferred slot during submission:

30 minutes
15 minutes
These slots are suitable for normal conference talks.
90 minutes (unconference)
Ideal for holding users meeting or workshop during the unconference day (Thursday).

All proposals will be submitted for review by a peer review panel made up of the XML Prague Program Committee. Submissions will be chosen based on interest, applicability, technical merit, and technical correctness.

Authors should strive to contain original material and belong in the topics previously listed. Submissions which can be construed as product or service descriptions (adverts) will likely be deemed inappropriate. Other approaches such as use case studies are welcome but must be clearly related to conference topics.

Proposals can have several forms:

full paper
In our opinion still ideal and classical way of proposing presentation. Full paper gives reviewers enough information to properly asses your proposal.
extended abstract
Concise 1-4 page long description of your topic. If you do not have time to write full paper proposal this is one possible way to go. Try to make your extended abstract concrete and specific. Too short or vague abstract will not convince reviewers that it is worth including into the conference schedule.
short video (max. 5 minutes)
If you are not writing person but you still have something interesting to present. Simply capture short video (no longer then 5 minutes) containing part of your presentation. Video can capture you or it can be screen cast.

I mentioned XSLT security attacks recently, perhaps you could do something similar on XQuery? Other ways to use XML and related technologies to breach cybersecurity?

Do submit proposals and enjoy XML Prague 2018!

Procrastinators – Dates/Location for Balisage: The Markup Conference 2018

Wednesday, October 4th, 2017

Procrastinators can be caught short, without enough time for proper procrastination on papers and slides.

To insure ample time for procrastination, Balisage: The Markup Conference 2018 has published its dates and location.

31 July 2018–3 August 2018 … Balisage: The Markup Conference
30 July 2018 … Symposium – topic to be announced
CAMBRiA Hotel & Suites
1 Helen Heneghan Way
Rockville, Maryland 20850
USA

For indecisive procrastinators, Balisage offers suggestions for your procrastination:

The 2017 program included papers discussing XML vocabularies, cutting-edge digital humanities, lossless JSON/XML roundtripping, reflections on concrete syntax and abstract syntax, parsing and generation, web app development using the XML stack, managing test cases, pipelining and micropipelinging, electronic health records, rethinking imperative algorithms for XSLT and XQuery, markup and intellectual property, digitiziging Ethiopian and Eritrean manuscripts, exploring “shapes” in RDF and their relationship to schema validation, exposing XML data to users of varying technical skill, test-suite management, and use case studies about large conversion applications, DITA, and SaxonJS.

Innovative procrastinators can procrastinate on other related topics, including any they find on the Master Topic List (ideas procrastinated on for prior Balisage conferences).

Take advantage of this opportunity to procrastinate early and long on your Balisage submissions. You and your audience will be glad you did!

PS: Don’t procrastinate on saying thank you to Tommie Usdin and company for another year of Balisage. Balisage improves XML theory and practice every year it is held.

XQuery (Walmsley – Updated15 Sept. 2017) – Pagination Differences

Tuesday, September 19th, 2017

For those of you smart enough to own a copy of XQuery by Priscilla Walmsley, it was updated as of 15 September 2017.

There’s a four (4) page difference in length between the original edition (758 pages) and the updated version (762 pages).

One two (2) page addition is the new section “Specifying Serialization Parameters by Using a Map” plus an unnecessary page break following the introduction to example 13-4 (of the updated version).

Chapter 13, Inputs and Outputs, now ends on page 228 instead of 226.

The other two pages arise from the insertion of array:put following prefix-from-QName and before map:put, in Appendix A. Built-in Function Reference.

I haven’t found any mention of the pagination difference, which will be confusing for students consulting Walmsley.

Since the edition is not being updated, putting the added four pages in an Appendix D or even in preface material numbered i, ii, …, would have preserved references across the first and second versions.

XQuery should be widely used. Creating unnecessary friction for using XQuery resources doesn’t advance that goal.

xquerl (“We always do it nice and rough” Tina Turner)

Thursday, September 14th, 2017

xqerl

From the webpage:

Erlang XQuery 3.1 Processor

This is a currently a draft/proof-of-concept. Please don’t try to use it for “real” computing!

It is passing about 91% its (~25k) test cases.

Features it has:

  • Module Feature
  • Higher-Order Function Feature

Features it does not have, but might later:

  • XQuery Update Facility
  • Schema Aware Feature
  • Typed Data Feature
  • Static Typing Feature
  • Serialization Feature

If you want to combine an interest in Erlang along with XQuery 3.1, you have arrived!

Decide for yourself which is the “nice” part and which is the “rough.”

Enjoy!

Statistical Functions for XQuery 3.1 (see OpenFormula)

Saturday, June 24th, 2017

simple-statsxq by Tim Thompson.

From the webpage:

Loosely inspired by the JavaScript simple-statistics project. The goal of this module is to provide a basic set of statistical functions for doing data analysis in XQuery 3.1.

Functions are intended to be implementation-agnostic.

Unit tests were written using the unit testing module of BaseX.

OpenFormula (Open Document Format for Office Applications (OpenDocument)) defines eighty-seven (87) statistical functions.

There are fifty-five (55) financial functions defined by OpenFormula, just in case you are interested.

Open data quality – Subject Identity By Another Name

Thursday, June 8th, 2017

Open data quality – the next shift in open data? by Danny Lämmerhirt and Mor Rubinstein.

From the post:

Some years ago, open data was heralded to unlock information to the public that would otherwise remain closed. In the pre-digital age, information was locked away, and an array of mechanisms was necessary to bridge the knowledge gap between institutions and people. So when the open data movement demanded “Openness By Default”, many data publishers followed the call by releasing vast amounts of data in its existing form to bridge that gap.

To date, it seems that opening this data has not reduced but rather shifted and multiplied the barriers to the use of data, as Open Knowledge International’s research around the Global Open Data Index (GODI) 2016/17 shows. Together with data experts and a network of volunteers, our team searched, accessed, and verified more than 1400 government datasets around the world.

We found that data is often stored in many different places on the web, sometimes split across documents, or hidden many pages deep on a website. Often data comes in various access modalities. It can be presented in various forms and file formats, sometimes using uncommon signs or codes that are in the worst case only understandable to their producer.

As the Open Data Handbook states, these emerging open data infrastructures resemble the myth of the ‘Tower of Babel’: more information is produced, but it is encoded in different languages and forms, preventing data publishers and their publics from communicating with one another. What makes data usable under these circumstances? How can we close the information chain loop? The short answer: by providing ‘good quality’ open data.

Congratulations to Open Knowledge International on re-discovering the ‘Tower of Babel’ problem that prevents easy re-use of data.

Contrary to Lämmerhirt and Rubinstein’s claim, barriers have not “…shifted and multiplied….” More accurate to say Lämmerhirt and Rubinstein have experienced what so many other researchers have found for decades:


We found that data is often stored in many different places on the web, sometimes split across documents, or hidden many pages deep on a website. Often data comes in various access modalities. It can be presented in various forms and file formats, sometimes using uncommon signs or codes that are in the worst case only understandable to their producer.

The record linkage community, think medical epidemiology, has been working on aspects of this problem since the 1950’s at least (under that name). It has a rich and deep history, focused in part on mapping diverse data sets to a common representation and then performing analysis upon the resulting set.

A common omission in record linkage is to capture in discoverable format, the basis for mapping of the diverse records to a common format. That is subjects represented by “…uncommon signs or codes that are in the worst case only understandable to their producer,” that Lämmerhirt and Rubinstein complain of, although signs and codes need not be “uncommon” to be misunderstood by others.

To their credit, unlike RDF and the topic maps default, record linkage has long recognized that identification consists of multiple parts and not single strings.

Topic maps, at least at their inception, was unaware of record linkage and the vast body of research done under that moniker. Topic maps were bitten by the very problem they were seeking to solve. That being a subject, could be identified many different ways and information discovered by others about that subject, could be nearby but undiscoverable/unknown.

Rather than building on the experience with record linkage, topic maps, at least in the XML version, defaulted to relying on URLs to identify the location of subjects (resources) and/of identifying subjects (identifiers). Avoiding the Philosophy 101 mistakes of RDF, confusing locators and identifiers + refusing to correct the confusion, wasn’t enough for topic maps to become widespread. One suspects in part because topic maps were premised on creating more identifiers for subjects which already had them.

Imagine that your company has 1,000 employees and in order to use a new system, say topic maps, everyone must get a new name. Can’t use the old one. Do you see a problem? Now multiple that by every subject anyone in your company wants to talk about. We won’t run out of identifiers but your staff will certainly run out of patience.

Robust solutions to the open data ‘Tower of Babel’ issue will include the use of multi-part identifications extant in data stores, dynamic creation of multi-part identifications when necessary (note, no change to existing data store), discoverable documentation of multi-part identifications and their mappings, where syntax and data models are up to the user of data.

That sounds like a job for XQuery to me.

You?

Balisage: The Markup Conference 2017 Program Now Available

Wednesday, May 17th, 2017

Balisage: The Markup Conference 2017 Program Now Available

An email from Tommie Usdin, Chair, Chief Organizer and herder of markup cats for Balisage advises:

Balisage: where serious markup practitioners and theoreticians meet every August.

The 2017 program includes papers discussing XML vocabularies, cutting-edge digital humanities, lossless JSON/XML roundtripping, reflections on concrete syntax and abstract syntax, parsing and generation, web app development using the XML stack, managing test cases, pipelining and micropipelinging, electronic health records, rethinking imperative algorithms for XSLT and XQuery, markup and intellectual property, why YOU should use (my favorite XML vocabulary), developing a system to aid in studying manuscripts in the tradition of the Ethiopian and Eritrean Highlands, exploring “shapes” in RDF and their relationship to schema validation, exposing XML data to users of varying technical skill, test-suite management, and use case studies about large conversion applications, DITA, and SaxonJS.

Up-Translation and Up-Transformation: A one-day Symposium on the goals, challenges, solutions, and workflows for significant XML enhancements, including approaches, tools, and techniques that may potentially be used for a variety of other tasks. The symposium will be of value not only to those facing up-translation and transformation but also to general XML practitioners seeking to get the most out of their data.

Are you interested in open information, reusable documents, and vendor and application independence? Then you need descriptive markup, and Balisage is your conference. Balisage brings together document architects, librarians, archivists, computer scientists, XML practitioners, XSLT and XQuery programmers, implementers of XSLT and XQuery engines and other markup-related software, semantic-Web evangelists, standards developers, academics, industrial researchers, government and NGO staff, industrial developers, practitioners, consultants, and the world’s greatest concentration of markup theorists. Some participants are busy designing replacements for XML while other still use SGML (and know why they do).

Discussion is open, candid, and unashamedly technical.

Balisage 2017 Program:
http://www.balisage.net/2017/Program.html

Symposium Program:
https://www.balisage.net/UpTransform

NOTE: Members of the TEI and their employees are eligible for discount Balisage registration.

You need to see the program for yourself but the highlights (for me) include: Ethiopic manuscripts (ok, so I have odd tastes), Earley parsers (of particular interest), English Majors (my wife was an English major), and a number of other high points.

Mark your calendar for July 31 – August 4, 2017 – It’s Balisage!

Pure CSS crossword – CSS Grid

Wednesday, April 19th, 2017

Pure CSS crossword – CSS Grid by Adrian Roworth.

The UI is slick, although creating the puzzle remains on you.

Certainly suitable for string answers, XQuery/XPath/XSLT expressions, etc.

Enjoy!

XQuery 3.1 and Company! (Deriving New Versions?)

Wednesday, March 22nd, 2017

XQuery 3.1: An XML Query Language W3C Recommendation 21 March 2017

Hurray!

Related reading of interest:

XML Path Language (XPath) 3.1

XPath and XQuery Functions and Operators 3.1

XQuery and XPath Data Model 3.1

These recommendations are subject to licenses that read in part:

No right to create modifications or derivatives of W3C documents is granted pursuant to this license, except as follows: To facilitate implementation of the technical specifications set forth in this document, anyone may prepare and distribute derivative works and portions of this document in software, in supporting materials accompanying software, and in documentation of software, PROVIDED that all such works include the notice below. HOWEVER, the publication of derivative works of this document for use as a technical specification is expressly prohibited.

You know I think the organization of XQuery 3.1 and friends could be improved but deriving and distributing “improved” versions is expressly prohibited.

Hmmm, but we are talking about XML and languages to query and transform XML.

Consider the potential of an query that calls XQuery 3.1: An XML Query Language and materials cited in it, then returns a version of XQuery 3.1 that has definitions from other standards off-set in the XQuery 3.1 text.

Or than inserts into the text examples or other materials.

For decades XML enthusiasts have bruited about dynamic texts but have produced damned few of them (as in zero) as their standards.

Let’s use the “no derivatives” language of the W3C as an incentive to not create another static document but a dynamic one that can grow or contract according to the wishes of its reader.

Suggestions for first round features?

Balisage Papers Due in 3 Weeks!

Thursday, March 16th, 2017

Apologies for the sudden lack of posting but I have been working on a rather large data set with XQuery and checking forwards and backwards to make sure it can be replicated. (I hate “it works on my computer.”)

Anyway, Tommie Usdin dropped an email bomb today with a reminder that Balisage papers are due on April 7, 2017.

From her email:

Submissions to “Balisage: The Markup Conference” and pre-conference symposium:
“Up-Translation and Up-Transformation: Tasks, Challenges, and Solutions”
are on April 7.

It is time to start writing!

Balisage: The Markup Conference 2017
August 1 — 4, 2017, Rockville, MD (a suburb of Washington, DC)
July 31, 2017 — Symposium Up-Translation and Up-Transformation
https://www.balisage.net/

Balisage: where serious markup practitioners and theoreticians meet every August. We solicit papers on any aspect of markup and its uses; topics include but are not limited to:

• Web application development with XML
• Informal data models and consensus-based vocabularies
• Integration of XML with other technologies (e.g., content management, XSLT, XQuery)
• Performance issues in parsing, XML database retrieval, or XSLT processing
• Development of angle-bracket-free user interfaces for non-technical users
• Semistructured data and full text search
• Deployment of XML systems for enterprise data
• Web application development with XML
• Design and implementation of XML vocabularies
• Case studies of the use of XML for publishing, interchange, or archiving
• Alternatives to XML
• the role(s) of XML in the application lifecycle
• the role(s) of vocabularies in XML environments

Detailed Call for Participation: http://balisage.net/Call4Participation.html
About Balisage: http://balisage.net/Call4Participation.html

pre-conference symposium:
Up-Translation and Up-Transformation: Tasks, Challenges, and Solutions
Chair: Evan Owens, Cenveo
https://www.balisage.net/UpTransform/index.html

Increasing the granularity and/or specificity of markup is an important task in many content and information workflows. Markup transformations might involve tasks such as high-level structuring, detailed component structuring, or enhancing information by matching or linking to external vocabularies or data. Enhancing markup presents secondary challenges including lack of structure of the inputs or inconsistency of input data down to the level of spelling, punctuation, and vocabulary. Source data for up-translation may be XML, word processing documents, plain text, scanned & OCRed text, or databases; transformation goals may be content suitable for page makeup, search, or repurposing, in XML, JSON, or any other markup language.

The range of approaches to up-transformation is as varied as the variety of specifics of the input and required outputs. Solutions may combine automated processing with human review or could be 100% software implementations. With the potential for requirements to evolve over time, tools may have to be actively maintained and enhanced. This is the place to discuss goals, challenges, solutions, and workflows for significant XML enhancements, including approaches, tools, and techniques that may potentially be used for a variety of other tasks.

For more information: info@balisage.net or +1 301 315 9631

I’m planning on posting tomorrow one way or the other!

While you wait for that, get to work on your Balisage paper!

XQuery Ready CIA Vault7 Files

Friday, March 10th, 2017

I have extracted the HTML files from WikiLeaks Vault7 Year Zero 2017 V 1.7z, processed them with Tidy (see note on correction below), and uploaded the “tidied” HTML files to: Vault7-CIA-Clean-HTML-Only.

Beyond the usual activities of Tidy, I did have to correct the file page_26345506.html: by creating a comment around one line of content:

<!– <declarations><string name=”½ö”></string></declarations&>lt;p>›<br> –>

Otherwise, the files are only corrected HTML markup with no other changes.

The HTML compresses well, 7811 files coming in at 3.4 MB.

Demonstrate the power of basic XQuery skills!

Enjoy!

How Bad Is Wikileaks Vault7 (CIA) HTML?

Thursday, March 9th, 2017

How bad?

Unless you want to hand correct 7809 html files to use with XQuery, grab the latest copy of Tidy

It’s not the worst HTML I have ever seen, but put that in the context of having seen a lot of really poor HTML.

I’ve “tidied” up a test collection and will grab a fresh copy of the files before producing and releasing a clean set of the HTML files.

Producing a document collection for XQuery processing. Working towards something suitable for application of NLP and other tools.

Oxford Dictionaries Thesaurus Data – XQuery

Sunday, February 12th, 2017

Retrieve Oxford Dictionaries API Thesaurus Data as XML with XQuery and BaseX by Adam Steffanick.

From the post:

We retrieved thesaurus data from the Oxford Dictionaries application programming interface (API) and returned Extensible Markup Language (XML) with XQuery, an XML query language, and BaseX, an XML database engine and XQuery processor. This tutorial illustrates how to retrieve thesaurus data—synonyms and antonyms—as XML from the Oxford Dictionaries API with XQuery and BaseX.

The Oxford Dictionaries API returns JavaScript Object Notation (JSON) responses that yield undesired XML structures when converted automatically with BaseX. Fortunately, we’re able to use XQuery to fill in some blanks after converting JSON to XML. My GitHub repository od-api-xquery contains XQuery code for this tutorial.

If you are having trouble staying at your computer during this unreasonably warm spring, this XQuery/Oxford Dictionary tutorial may help!

Ok, maybe that is an exaggeration but only a slight one. 😉

Enjoy!

Functional Programming in Erlang – MOOC – 20 Feb. 2017

Wednesday, February 8th, 2017

Functional Programming in Erlang with Simon Thompson (co-author of Erlang Programming)

From the webpage:

Functional programming is increasingly important in providing global-scale applications on the internet. For example, it’s the basis of the WhatsApp messaging system, which has over a billion users worldwide.

This free online course is designed to teach the principles of functional programming to anyone who’s already able to program, but wants to find out more about the novel approach of Erlang.

Learn the theory of functional programming and apply it in Erlang

The course combines the theory of functional programming and the practice of how that works in Erlang. You’ll get the opportunity to reinforce what you learn through practical exercises and more substantial, optional practical projects.

Over three weeks, you’ll:

  • learn why Erlang was developed, how its design was shaped by the context in which it was used, and how Erlang can be used in practice today;
  • write programs using the concepts of functional programming, including, in particular, recursion, pattern matching and immutable data;
  • apply your knowledge of lists and other Erlang data types in your programs;
  • and implement higher-order functions using generic patterns.

The course will also help you if you are interested in Elixir, which is based on the same virtual machine as Erlang, and shares its fundamental approach as well as its libraries, and indeed will help you to get going with any functional language, and any message-passing concurrency language – for example, Google Go and the Akka library for Scala/Java.

If you are not excited already, remember that XQuery is a functional programming language. What if your documents were “immutable data?”

Use #FLerlangfunc to see Twitter discussions on the course.

That looks like a committee drafted hashtag. 😉

Digital Humanities / Studies: U.Pitt.Greenberg

Wednesday, February 1st, 2017

Digital Humanities / Studies: U.Pitt.Greenberg maintained by Elisa E. Beshero-Bondar.

I discovered this syllabus and course materials by accident when one of its modules on XQuery turned up in a search. Backing out of that module I discovered this gem of a digital humanities course.

The course description:

Our course in “digital humanities” and “digital studies” is designed to be interdisciplinary and practical, with an emphasis on learning through “hands-on” experience. It is a computer course, but not a course in which you learn programming for the sake of learning a programming language. It’s a course that will involve programming, and working with coding languages, and “putting things online,” but it’s not a course designed to make you, in fifteen weeks, a professional website designer. Instead, this is a course in which we prioritize what we can investigate in the Humanities and related Social Sciences fields about cultural, historical, and literary research questions through applications in computer coding and programming, which you will be learning and applying as you go in order to make new discoveries and transform cultural objects—what we call “texts” in their complex and multiple dimensions. We think of “texts” as the transmittable, sharable forms of human creativity (mainly through language), and we interface with a particular text in multiple ways through print and electronic “documents.” When we refer to a “document,” we mean a specific instance of a text, and much of our work will be in experimenting with the structures of texts in digital document formats, accessing them through scripts we write in computer code—scripts that in themselves are a kind of text, readable both by humans and machines.

Your professors are scholars and teachers of humanities, not computer programmers by trade, and we teach this course from our backgrounds (in literature and anthropology, respectively). We teach this course to share coding methods that are highly useful to us in our fields, with an emphasis on working with texts as artifacts of human culture shaped primarily with words and letters—the forms of “written” language transferable to many media (including image and sound) that we can study with computer modelling tools that we design for ourselves based on the questions we ask. We work with computers in this course as precision instruments that help us to read and process great quantities of information, and that lead us to make significant connections, ask new kinds of questions, and build models and interfaces to change our reading and thinking experience as people curious about human history, culture, and creativity.

Our focus in this course is primarily analytical: to apply computer technologies to represent and investigate cultural materials. As we design projects together, you will gain practical experience in editing and you will certainly fine-tune your precision in writing and thinking. We will be working primarily with eXtensible Markup Language (XML) because it is a powerful tool for modelling texts that we can adapt creatively to our interests and questions. XML represents a standard in adaptability and human-readability in digital code, and it works together with related technologies with which you will gain working experience: You’ll learn how to write XPath expressions: a formal language for searching and extracting information from XML code which serves as the basis for transforming XML into many publishable forms, using XSLT and XQuery. You’ll learn to write XSLT: a programming “stylesheet” transforming language designed to convert XML to publishable formats, as well as XQuery, a query (or search) language for extracting information from XML files bundled collectively. You will learn how to design your own systematic coding methods to work on projects, and how to write your own rules in schema languages (like Schematron and Relax-NG) to keep your projects organized and prevent errors. You’ll gain experience with an international XML language called TEI (after the Text Encoding Initiative) which serves as the international standard for coding digital archives of cultural materials. Since one of the best and most widely accessible ways to publish XML is on the worldwide web, you’ll gain working experience with HTML code (a markup language that is a kind of XML) and styling HTML with Cascading Stylesheets (CSS). We will do all of this with an eye to your understanding how coding works—and no longer relying without question on expensive commercial software as the “only” available solution, because such software is usually not designed with our research questions in mind.

We think you’ll gain enough experience at least to become a little dangerous, and at the very least more independent as investigators and makers who wield computers as fit instruments for your own tasks. Your success will require patience, dedication, and regular communication and interaction with us, working through assignments on a daily basis. Your success will NOT require perfection, but rather your regular efforts throughout the course, your documenting of problems when your coding doesn’t yield the results you want. Homework exercises are a back-and-forth, intensive dialogue between you and your instructors, and we plan to spend a great deal of time with you individually over these as we work together. Our guiding principle in developing assignments and working with you is that the best way for you to learn and succeed is through regular practice as you hone your skills. Our goal is not to make you expert programmers (as we are far from that ourselves)! Rather, we want you to learn how to manipulate coding technologies for your own purposes, how to track down answers to questions, how to think your way algorithmically through problems and find good solutions.

Skimming the syllabus rekindles an awareness of the distinction between the “hard” sciences and the “difficult” ones.

Enjoy!

Update:

After yesterday’s post, Elisa Beshero-Bondar tweeted this one course is now two:

At a new homepage: newtFire {dh|ds}!

Enjoy!

Up-Translation and Up-Transformation … [Balisage Rocks!]

Sunday, January 29th, 2017

Up-Translation and Up-Transformation: Tasks, Challenges, and Solutions (a Balisage pre-conference symposium)

When & Where:

Monday July 31, 2017
CAMBRiA Hotel, Rockville, MD USA

Chair: Evan Owens, Cenveo

You need more details than that?

Ok, from the webpage:

Increasing the granularity and/or specificity of markup is an important task in many different content and information workflows. Markup transformations might involve tasks such as high-level structuring, detailed component structuring, or enhancing information by matching or linking to external vocabularies or data. Enhancing markup presents numerous secondary challenges including lack of structure of the inputs or inconsistency of input data down to the level of spelling, punctuation, and vocabulary. Source data for up-translation may be XML, word processing documents, plain text, scanned & OCRed text, or databases; transformation goals may be content suitable for page makeup, search, or repurposing, in XML, JSON, or any other markup language.

The range of approaches to up-transformation is as varied as the variety of specifics of the input and required outputs. Solutions may combine automated processing with human review or could be 100% software implementations. With the potential for requirements to evolve over time, tools may have to be actively maintained and enhanced.

The presentations in this pre-conference symposium will include goals, challenges, solutions, and workflows for significant XML enhancements, including approaches, tools, and techniques that may potentially be used for a variety of other tasks. The symposium will be of value not only to those facing up-translation and transformation but also to general XML practitioners seeking to get the most out of their data.

If I didn’t know better, up-translation and up-transformation sound suspiciously like conferred properties of topic maps fame.

Well, modulo that conferred properties could be predicated on explicit subject identity and not hidden in the personal knowledge of the author.

There are two categories of up-translation and up-transformation:

  1. Ones that preserve jobs like spaghetti Cobol code, and
  2. Ones that support easy long term maintenance.

While writing your paper for the pre-conference, which category fits yours the best?

XQuery Update 3.0 – WG Notes

Tuesday, January 24th, 2017

A tweet by Jonathan Robie points out:

XQuery Update Facility 3.0

and

XQuery Update Facility 3.0 Requirements and Use Cases

have been issued as working group notes.

It’s not clear to me why anyone continues to think of data as mutable.

Mutable data was an artifact of limited storage and processing.

Neither of which obtains in a modern computing environment. (Yes, there are edge cases, the Large Hadron Collider for example.)

Still, if your interested, read on!

BaseX 8.6 Is Out!

Tuesday, January 24th, 2017

Email from Christian Grün arrived today with great news!

BaseX 8.6 is out! The new version of our XML database system and XQuery 3.1 processor includes countless improvements and optimizations. Many of them have been triggered by your valuable feedback, many others are the result of BaseX used in productive and commercial environments.

The most prominent new features are:

LOCKING

  • jobs without database access will never be locked
  • read transactions are now favored (adjustable via FAIRLOCK)

RESTXQ

  • file monitoring was improved (adjustable via PARSERESTXQ)
  • authentication was reintroduced (no passwords anymore in web.xml)
  • id session attributes will show up in log data

DBA

  • always accessible, even if job queue is full
  • pagination of table results

INDEXING

  • path index improved: distinct values storage for numeric types

XQUERY

  • aligned with latest version of XQuery 3.1
  • updated functions: map:find, map:merge, fn:sort, array:sort, …
  • enhancements in User, Process, Jobs, REST and Database Module

CSV DATA

  • improved import/export compatibility with Excel data

Visit http://basex.org to find the latest release, and check out http://docs.basex.org/ to get more information. As always, we are looking forward to your feedback. Enjoy!

If one or more of your colleagues goes missing this week, suspect BaseX 8.6 and the new W3C drafts for XQuery are responsible.

😉

XQuery/XSLT Proposals – Comments by 28 February 2017

Tuesday, January 24th, 2017

Proposed Recommendations Published for XQuery WG and XSLT WG.

From the webpage:

The XML Query Working Group and XSLT Working Group have published a Proposed Recommendation for four documents:

  • XQuery and XPath Data Model 3.1: This document defines the XQuery and XPath Data Model 3.1, which is the data model of XML Path Language (XPath) 3.1, XSL Transformations (XSLT) Version 3.0, and XQuery 3.1: An XML Query Language. The XQuery and XPath Data Model 3.1 (henceforth “data model”) serves two purposes. First, it defines the information contained in the input to an XSLT or XQuery processor. Second, it defines all permissible values of expressions in the XSLT, XQuery, and XPath languages.
  • XPath and XQuery Functions and Operators 3.1: The purpose of this document is to catalog the functions and operators required for XPath 3.1, XQuery 3.1, and XSLT 3.0. It defines constructor functions, operators, and functions on the datatypes defined in XML Schema Part 2: Datatypes Second Edition and the datatypes defined in XQuery and XPath Data Model (XDM) 3.1. It also defines functions and operators on nodes and node sequences as defined in the XQuery and XPath Data Model (XDM) 3.1.
  • XML Path Language (XPath) 3.1: XPath 3.1 is an expression language that allows the processing of values conforming to the data model defined in XQuery and XPath Data Model (XDM) 3.1. The name of the language derives from its most distinctive feature, the path expression, which provides a means of hierarchic addressing of the nodes in an XML tree. As well as modeling the tree structure of XML, the data model also includes atomic values, function items, and sequences.
  • XSLT and XQuery Serialization 3.1: This document defines serialization of an instance of the data model as defined in XQuery and XPath Data Model (XDM) 3.1 into a sequence of octets. Serialization is designed to be a component that can be used by other specifications such as XSL Transformations (XSLT) Version 3.0 or XQuery 3.1: An XML Query Language.

Comments are welcome through 28 February 2017.

Get your red pen out!

Unlike political flame wars on social media, comments on these proposed recommendatons could make a useful difference.

Enjoy!

XML.com Relaunch!

Monday, January 16th, 2017

XML.com

Lauren Wood posted this note about the relaunch of XML.com recently:

I’ve relaunched XML.com (for some background, Tim Bray wrote an article here: https://www.xml.com/articles/2017/01/01/xmlcom-redux/). I’m hoping it will become part of the community again, somewhere for people to post their news (submit your news here: https://www.xml.com/news/submit-news-item/) and articles (see the guidelines at https://www.xml.com/about/contribute/). I added a job board to the site as well (if you’re in Berlin, Germany, or able to
move there, look at the job currently posted; thanks LambdaWerk!); if your employer might want to post XML-related jobs please email me.

The old content should mostly be available but some articles were previously available at two (or more) locations and may now only be at one; try the archive list (https://www.xml.com/pub/a/archive/) if you’re looking for something. Please let me know if something major is missing from the archives.

XML is used in a lot of areas, and there is a wealth of knowledge in this community. If you’d like to write an article, send me your ideas. If you have comments on the site, let me know that as well.

Just in time as President Trump is about to stir, vigorously, that big pot of crazy known as federal data.

Mapping, processing, transformation demands will grow at an exponential rate.

Notice the emphasis on demand.

Taking a two weeks to write custom software to sort files (you know the Weiner/Abedin laptop story, yes?) won’t be acceptable quite soon.

How are your on-demand XML chops?

Looking up words in the OED with XQuery [Details on OED API Key As Well]

Saturday, January 14th, 2017

Looking up words in the OED with XQuery by Clifford Anderson.

Clifford has posted a gist of work from the @VandyLibraries XQuery group, looking up words in the Oxford English Dictionary (OED) with XQuery.

To make full use of Clifford’s post, you will need for the Oxford Dictionaries API.

If you go straight to the regular Oxford English Dictionary (I’m omitting the URL so you don’t make the same mistake), there is nary a mention of the Oxford Dictionaries API.

The free plan allows 3K queries a month.

Not enough to shut out the outside world for the next four/eight years but enough to decide if it’s where you want to hide.

Application for the free api key was simple enough.

Save that the dumb password checker insisted on one or more special characters, plus one or more digits, plus upper and lowercase. When you get beyond 12 characters the insistence on a special character is just a little lame.

Email response with the key was fast, so I’m in!

What about you?

BaseX – Bad News, Good News

Wednesday, January 4th, 2017

Good news and bad news from BaseX. Christian Grün posted to the BaseX discussion list today:

Dear all,

This year, there will be no BaseX Preconference Meeting in XML Prague. Sorry for that! The reason is simple: We were too busy in order to prepare ourselves for the event, look for speakers, etc.

At least we are happy to announce that BaseX 8.6 will finally be released this month. Stay tuned!

All the best,

Christian

That’s disappointing but understandable. Console yourself with watching presentations from 2013 – 2016 and reviewing issues for 8.6.

Just a guess on my part, ;-), but I suspect more testing of BaseX builds would not go unappreciated.

Something to keep yourself busy while waiting for BaseX 8.6 to drop.

XQuery/XPath CRs 3.1! [#DisruptJ20 Twitter Game]

Tuesday, December 13th, 2016

Just in time for the holidays, new CRs for XQuery/XPath hit the street! Comments due by 2017-01-10.

XQuery and XPath Data Model 3.1 https://www.w3.org/TR/2016/CR-xpath-datamodel-31-20161213/

XML Path Language (XPath) 3.1 https://www.w3.org/TR/2016/CR-xpath-31-20161213/

XQuery 3.1: An XML Query Language https://www.w3.org/TR/2016/CR-xquery-31-20161213/

XPath and XQuery Functions and Operators 3.1 https://www.w3.org/TR/2016/CR-xpath-functions-31-20161213/

XQueryX 3.1 https://www.w3.org/TR/2016/CR-xqueryx-31-20161213/

#DisruptJ20 is too late for comments to the W3C but you can break the boredom of indeterminate waiting to protest excitedly for TV cameras and/or to be arrested.

How?

Play the XQuery/XPath 3.1 Twitter Game!

Definitions litter the drafts and appear as:

[Definition: A sequence is an ordered collection of zero or more items.]

You Tweet:

An ordered collection of zero or more items? #xquery

Correct response:

A sequence.

Some definitions are too long to be tweeted in full:

An expanded-QName is a value in the value space of the xs:QName datatype as defined in the XDM data model (see [XQuery and XPath Data Model (XDM) 3.1]): that is, a triple containing namespace prefix (optional), namespace URI (optional), and local name. (xpath-functions)

Suggest you tweet:

A triple containing namespace prefix (optional), namespace URI (optional), and local name.

or

A value in the value space of the xs:QName datatype as defined in the XDM data model (see [XQuery and XPath Data Model (XDM) 3.1].

In both cases, the correct response:

An expanded-QName.

Use a $10 burner phone and it unlocked at protests. If your phone is searched, imagine the attempts to break the “code.”

You could agree on definitions/responses as instructions for direct action. But I digress.

4 Days Left – Submission Alert – XML Prague

Sunday, December 11th, 2016

A tweet by Jirka Kosek reminded me there are only 4 days left for XML Prague submissions!

  • December 15th – End of CFP (full paper or extended abstract)
  • January 8th – Notification of acceptance/rejection of paper to authors
  • January 29th – Final paper

From the call for papers:

XML Prague 2017 now welcomes submissions for presentations on the following topics:

  • Markup and the Extensible Web – HTML5, XHTML, Web Components, JSON and XML sharing the common space
  • Semantic visions and the reality – micro-formats, semantic data in business, linked data
  • Publishing for the 21th century – publishing toolchains, eBooks, EPUB, DITA, DocBook, CSS for print, …
  • XML databases and Big Data – XML storage, indexing, query languages, …
  • State of the XML Union – updates on specs, the XML community news, …

All proposals will be submitted for review by a peer review panel made up of the XML Prague Program Committee. Submissions will be chosen based on interest, applicability, technical merit, and technical correctness.

Accepted papers will be included in published conference proceedings.

I don’t travel but if you need a last-minute co-author or proofer, you know where to find me!

Manipulate XML Text Data Using XQuery String Functions

Tuesday, November 22nd, 2016

Manipulate XML Text Data Using XQuery String Functions by Adam Steffanick.

Adam’s notes from the Vanderbilt University XQuery Working Group:

We evaluated and manipulated text data (i.e., strings) within Extensible Markup Language (XML) using string functions in XQuery, an XML query language, and BaseX, an XML database engine and XQuery processor. This tutorial covers the basics of how to use XQuery string functions and manipulate text data with BaseX.

We used a limited dataset of English words as text data to evaluate and manipulate, and I’ve created a GitHub gist of XML input and XQuery code for use with this tutorial.

A quick run-through of basic XQuery string functions that takes you up to writing your own XQuery function.

While we wait for more reports from the Vanderbilt University XQuery Working Group, have you considered using XQuery to impose different views on a single text document?

For example, some Bible translations follow the “traditional” chapter and verse divisions (a very late addition), while others use paragraph level organization and largely ignore the tradition verses.

Creating a view of a single source text as either one or both should not involve permanent changes to a source file in XML. Or at least not the original source file.

If for processing purposes there was a need for a static file rendering one way or the other, that’s doable but should be separate from the original XML file.

XML Prague 2017, February 9-11, 2017 – Registration Opens!

Wednesday, November 16th, 2016

XML Prague 2017, February 9-11, 2017

I mentioned XML Prague 2017 last month and now, after the election of Donald Trump as president of the United States, registration for the conference opens!

Coincidence?

Maybe. 😉

Even if you are returning to the U.S. after the conference, XML Prague will be a welcome respite from the tempest of news coverage of what isn’t known about the impending Trump administration.

At 120 Euros for three days, this is a great investment both professionally and emotionally.

Enjoy!