TUSTEP is open source – with TXSTEP providing a new XML interface

Monday, July 9th, 2012

I won’t recount how many years ago I first received email from Wilhelm Ott about TUSTEP. ­čśë

From the TUSTEP homepage:

TUSTEP is a professional toolbox for scholarly processing textual data (including those in non-latin scripts) with a strong focus on humanities applications. It contains modules for all stages of scholarly text data processing, starting from data capture and including information retrieval, text collation, text analysis, sorting and ordering, rule-based text manipulation, and output in electronic or conventional form (including typesetting in professional quality).

Since the title “big data” is taken, perhaps we should take “complex data” for texts.

If you are exploring textual data in any detail or with XML, you should give take a look at the TUSTEP project and its new XML interface, TXSTEP.

Or consider contributing to the project as well.

Wilhelm Ott writes (in part):

We are pleased to announce that, starting with the release 2012, TUSTEP is available as open source software. It is distributed under the Revised BSD Licence and can be downloaded from

TUSTEP has a long tradition as a highly flexible, reliable, efficient suite of programs for humanities computing. It started in the early 70ies as a tool for supporting humanities projects at the University of T├╝bingen, relying on own funds of the University. From 1985 to 1989, a substantial grant from the Land Baden-W├╝rttemberg officially opened its distribution beyond the limits of the University and started its success as a highly appreciated research tool for many projects at about a hundred universities and academic institutions in the German speaking part of the world, represented since 1993 in the International TUSTEP User Group (ITUG). Reports on important projects relying on TUSTEP and a list of publications (includig lexicograpic works and critical editions) can be found on the tustep webpage.

TXSTEP, presently being developed in cooperation with Stuttgart Media University, offers a new XML-based user interface to the TUSTEP programs. Compared to the original TUSTEP commands, we see important advantages:

  • it will offer an up-to-date established syntax for scripting;
  • it will show the typical benefits of working with an XML editor, like content completion, highlighting, showing annotations, and, of course, verifying the code;
  • it will offer – to a certain degree – a self teaching environment by commenting on the scope of every step;
  • it will help to avoid many syntactical errors, even compared to the original TUSTEP scripting environment;
  • the syntax is in English, providing a more widespread usability than TUSTEP’s German command language.

At the TEI conference last year in W├╝rzburg, we presented a first prototype to an international audience. We look forward to DH2012 in Hamburg next week where, during the Poster Session, a more enhanced version which already contains most of TUSTEPs functions will be presented. A demonstration of TXSTEPs functionality will include tasks which can not easily be performed by existing XML tools.

After the demo, you are invited to download a test version of TXSTEP to play with, to comment on it and to help make it a great and flexible tool for everyday – and complex – questions.

OK, I confess a fascination with complex textual analysis.