Syntacticus – Early Indo-European Languages


From the about page:

Syntacticus provides easy access to around a million morphosyntactically annotated sentences from a range of early Indo-European languages.

Syntacticus is an umbrella project for the PROIEL Treebank, the TOROT Treebank and the ISWOC Treebank, which all use the same annotation system and share similar linguistic priorities. In total, Syntacticus contains 80,138 sentences or 936,874 tokens in 10 languages.

We are constantly adding new material to Syntacticus. The ultimate goal is to have a representative sample of different text types from each branch of early Indo-European. We maintain lists of texts we are working on at the moment, which you can find on the PROIEL Treebank and the TOROT Treebank pages, but this is extremely time-consuming work so please be patient!

The focus for Syntacticus at the moment is to consolidate and edit our documentation so that it is easier to approach. We are very aware that the current documentation is inadequate! But new features and better integration with our development toolchain are also on the horizon in the near future.

Language Size
Ancient Greek 250,449 tokens
Latin 202,140 tokens
Classical Armenian 23,513 tokens
Gothic 57,211 tokens
Portuguese 36,595 tokens
Spanish 54,661 tokens
Old English 29,406 tokens
Old French 2,340 tokens
Old Russian 209,334 tokens
Old Church Slavonic 71,225 tokens

The mention of Old Russian should attract attention, given the media frenzy over Russia these days. However, the data at Syntacticus is meaningful, unlike news reports that reflect Western ignorance more often than news.

You may have noticed US reports have moved from guilt by association to guilt by nationality (anyone who is Russian = Putin confidant) and are approaching guilt by proximity (citizen of any country near Russia = Putin puppet).

It’s hard to imagine a political campaign without crimes being committed by someone but traditionally, in law courts anyway, proof precedes a decision of guilt.

Looking forward to competent evidence (that’s legal terminology with a specific meaning), tested in an open proceeding against the elements of defined offenses. That’s a far cry from current discussions.

Comments are closed.