Archive for the ‘Greek’ Category

An Initial Reboot of Oxlos

Tuesday, April 18th, 2017

An Initial Reboot of Oxlos by James Tauber.

From the post:

In a recent post, Update on LXX Progress, I talked about the possibility of putting together a crowd-sourcing tool to help share the load of clarifying some parse code errors in the CATSS LXX morphological analysis. Last Friday, Patrick Altman and I spent an evening of hacking and built the tool.

Back at BibleTech 2010, I gave a talk about Django, Pinax, and some early ideas for a platform built on them to do collaborative corpus linguistics. Patrick Altman was my main co-developer on some early prototypes and I ended up hiring him to work with me at Eldarion.

The original project was called oxlos after the betacode transcription of the Greek word for “crowd”, a nod to “crowd-sourcing”. Work didn’t continue much past those original prototypes in 2010 and Pinax has come a long way since so, when we decided to work on oxlos again, it made sense to start from scratch. From the initial commit to launching the site took about six hours.

At the moment there is one collective task available—clarifying which of a set of parse codes is valid for a given verb form in the LXX—but as the need for others arises, it will be straightforward to add them (and please contact me if you have similar tasks you’d like added to the site).
… (emphasis in the original)

Crowd sourcing, parse code errors in the CATSS LXX morphological analysis, Patrick Altman and James Tauber! What’s more could you ask for!

Well, assuming you enjoy Django development, or have Greek morphology, sign up at:

After mastering Greek, you don’t really want to lose it from lack of practice. Yes? Perfect opportunity for recent or even not so recent Classics and divinity majors.

I suppose that’s a nice way to say you won’t be encountering LXX Greek on ESPN or CNN. 😉

New MorphGNT Releases and Accentuation Analysis

Thursday, February 16th, 2017

New MorphGNT Releases and Accentuation Analysis by James Tauber.

From the post:

Back in 2015, I talked about Annotating the Normalization Column in MorphGNT. This post could almost be considered Part 2.

I recently went back to that work and made a fresh start on a new repo gnt-accentuation intended to explain the accentuation of each word in the GNT (and eventually other Greek texts). There’s two parts to that: explaining why the normalized form is accented the way it but then explaining why the word-in-context might be accented differently (clitics, etc). The repo is eventually going to do both but I started with the latter.

My goal with that repo is to be part of the larger vision of an “executable grammar” I’ve talked about for years where rules about, say, enclitics, are formally written up in a way that can be tested against the data. This means:

  • students reading a rule can immediately jump to real examples (or exceptions)
  • students confused by something in a text can immediately jump to rules explaining it
  • the correctness of the rules can be tested
  • errors in the text can be found

It is the fourth point that meant that my recent work uncovered some accentuation issues in the SBLGNT, normalization and lemmatization. Some of that has been corrected in a series of new releases of the MorphGNT: 6.08, 6.09, and 6.10. See for details of specifics. The reason for so many releases was I wanted to get corrections out as soon as I made them but then I found more issues!

There are some issues in the text itself which need to be resolved. See the Github issue for details. I’d very much appreciate people’s input.

In the meantime, stay tuned for more progress on gnt-accentuation.

Was it random chance that I saw this announcement from James and Getting your hands dirty with the Digital Manuscripts Toolkit on the same day?


I should mention that Codex Sinaiticus (second oldest witness to the Greek New Testament) and numerous other Greek NT manuscripts have been digitized by the British Library.

Paring these resources together offers a great opportunity to discover the Greek NT text as choices made by others. (Same holds true for the Hebrew Bible as well.)

greek-accentuation 1.0.0 Released

Thursday, July 28th, 2016

greek-accentuation 1.0.0 Released by James Tauber.

From the post:

greek-accentuation has finally hit 1.0.0 with a couple more functions and a module layout change.

The library (which I’ve previously written about here) has been sitting on 0.9.9 for a while and I’ve been using it sucessfully in my inflectional morphology work for 18 months. There were, however, a couple of functions that lived in the inflectional morphology repos that really belonged in greek-accentuation. They have now been moved there.

If that sounds a tad obscure, some additional explanation from an earlier post by James:

It [greek-accentuation] consists of three modules:

  • characters
  • syllabify
  • accentuation

The characters module provides basic analysis and manipulation of Greek characters in terms of their Unicode diacritics as if decomposed. So you can use it to add, remove or test for breathing, accents, iota subscript or length diacritics.

The syllabify module provides basic analysis and manipulation of Greek syllables. It can syllabify words, give you the onset, nucleus, code, rime or body of a syllable, judge syllable length or give you the accentuation class of word.

The accentuation module uses the other two modules to accentuate Ancient Greek words. As well as listing possible_accentuations for a given unaccented word, it can produce recessive and (given another form with an accent) persistent accentuations.

Another name from my past and a welcome reminder that not all of computer science is focused on recommending ephemera for our consumption.

Modelling Stems and Principal Part Lists (Attic Greek)

Friday, June 17th, 2016

Modelling Stems and Principal Part Lists by James Tauber.

From the post:

This is part 0 of a series of blog posts about modelling stems and principal part lists, particularly for Attic Greek but hopefully more generally applicable. This is largely writing up work already done but I’m doing cleanup as I go along as well.

A core part of the handling of verbs in the Morphological Lexicon is the set of terminations and sandhi rules that can generate paradigms attested in grammars like Louise Pratt’s The Essentials of Greek Grammar. Another core part is the stem information for a broader range of verbs usually conveyed in works like Pratt’s in the form of lists of principal parts.

A rough outline of future posts is:

  • the sources of principal part lists for this work
  • lemmas in the Pratt principal parts
  • lemma differences across lists
  • what information is captured in each of the lists individually
  • how to model a merge of the lists
  • inferring stems from principal parts
  • stems, terminations and sandhi
  • relationships between stems
  • ???

I’ll update this outline with links as posts are published.

(emphasis in original)

A welcome reminder of projects that transcend the ephemera that is social media.

Or should I say “modern” social media?

The texts we parse so carefully were originally spoken, recorded and copied, repeatedly, without the benefit of modern reference grammars and/or dictionaries.