### New MorphGNT Releases and Accentuation Analysis

Thursday, February 16th, 2017

From the post:

Back in 2015, I talked about Annotating the Normalization Column in MorphGNT. This post could almost be considered Part 2.

I recently went back to that work and made a fresh start on a new repo gnt-accentuation intended to explain the accentuation of each word in the GNT (and eventually other Greek texts). There’s two parts to that: explaining why the normalized form is accented the way it but then explaining why the word-in-context might be accented differently (clitics, etc). The repo is eventually going to do both but I started with the latter.

My goal with that repo is to be part of the larger vision of an “executable grammar” I’ve talked about for years where rules about, say, enclitics, are formally written up in a way that can be tested against the data. This means:

• students reading a rule can immediately jump to real examples (or exceptions)
• students confused by something in a text can immediately jump to rules explaining it
• the correctness of the rules can be tested
• errors in the text can be found

It is the fourth point that meant that my recent work uncovered some accentuation issues in the SBLGNT, normalization and lemmatization. Some of that has been corrected in a series of new releases of the MorphGNT: 6.08, 6.09, and 6.10. See https://github.com/morphgnt/sblgnt/releases for details of specifics. The reason for so many releases was I wanted to get corrections out as soon as I made them but then I found more issues!

There are some issues in the text itself which need to be resolved. See the Github issue https://github.com/morphgnt/sblgnt/issues/52 for details. I’d very much appreciate people’s input.

In the meantime, stay tuned for more progress on gnt-accentuation.

Was it random chance that I saw this announcement from James and Getting your hands dirty with the Digital Manuscripts Toolkit on the same day?

😉

I should mention that Codex Sinaiticus (second oldest witness to the Greek New Testament) and numerous other Greek NT manuscripts have been digitized by the British Library.

Paring these resources together offers a great opportunity to discover the Greek NT text as choices made by others. (Same holds true for the Hebrew Bible as well.)

### S20-211a Hebrew Bible Technology Buffet – November 20, 2016 (save that date!)

Tuesday, October 18th, 2016

S20-211a Hebrew Bible Technology Buffet

From the webpage:

On Sunday, November 20th 2016, from 1:00 PM to 3:30 PM, GERT will host a session with the theme “Hebrew Bible Technology Buffet” at the SBL Annual Meeting in room 305 of the Convention Center. Barry Bandstra of Hope College will preside.

The session has four presentations:

Presentations will be followed by a discussion session.

You will need to register for the Annual Meeting to attend the session.

Assuming they are checking “badges” to make sure attendees have registered. Registration is very important to those who “foster” biblical scholarship by comping travel and rooms for their close friends.

PS: The website reports non-member registration is \$490.00. I would like to think that is a mis-print but I suspect its not.

That’s one way to isolate yourself from an interested public. By way of contrast, snail-mail Biblical Greek courses in the 1890’s had tens of thousands of subscribers. When academics complain of being marginalized, use this as an example of self-marginalization.

### Modelling Stems and Principal Part Lists (Attic Greek)

Friday, June 17th, 2016

From the post:

This is part 0 of a series of blog posts about modelling stems and principal part lists, particularly for Attic Greek but hopefully more generally applicable. This is largely writing up work already done but I’m doing cleanup as I go along as well.

A core part of the handling of verbs in the Morphological Lexicon is the set of terminations and sandhi rules that can generate paradigms attested in grammars like Louise Pratt’s The Essentials of Greek Grammar. Another core part is the stem information for a broader range of verbs usually conveyed in works like Pratt’s in the form of lists of principal parts.

A rough outline of future posts is:

• the sources of principal part lists for this work
• lemmas in the Pratt principal parts
• lemma differences across lists
• what information is captured in each of the lists individually
• how to model a merge of the lists
• inferring stems from principal parts
• stems, terminations and sandhi
• relationships between stems
• ???

I’ll update this outline with links as posts are published.

(emphasis in original)

A welcome reminder of projects that transcend the ephemera that is social media.

Or should I say “modern” social media?

The texts we parse so carefully were originally spoken, recorded and copied, repeatedly, without the benefit of modern reference grammars and/or dictionaries.

Enjoy!

### Bible vs. Quran – Who’s More Violent?

Friday, January 22nd, 2016

Bible vs. Quran – Text analysis answers: Is the Quran really more violent than the Bible? by Tom H. C. Anderson.

Tom’s series appears in three parts, but sharing the common title:

Part I: The Project

From part 1:

With the proliferation of terrorism connected to Islamic fundamentalism in the late-20th and early 21st centuries, the question of whether or not there is something inherently violent about Islam has become the subject of intense and widespread debate.

Even before 9/11—notably with the publication of Samuel P Huntington’s “Clash of Civilizations” in 1996—pundits have argued that Islam incites followers to violence on a level that sets it apart from the world’s other major religions.

The November 2015 Paris attacks and the politicking of a U.S. presidential election year—particularly candidate Donald Trump’s call for a ban on Muslim’s entering the country and President Obama’s response in the State of the Union address last week—have reanimated the dispute in the mainstream media, and proponents and detractors, alike, have marshalled “experts” to validate their positions.

To understand a religion, it’s only logical to begin by examining its literature. And indeed, extensive studies in a variety of academic disciplines are routinely conducted to scrutinize and compare the texts of the world’s great religions.

We thought it would be interesting to bring to bear the sophisticated data mining technology available today through natural language processing and unstructured text analytics to objectively assess the content of these books at the surface level.

So, we’ve conducted a shallow but wide comparative analysis using OdinText to determine with as little bias as possible whether the Quran is really more violent than its Judeo-Christian counterparts.

Part II: Emotional Analysis Reveals Bible is “Angriest”

From part 2:

In my previous post, I discussed our potentially hazardous plan to perform a comparative analysis using an advanced data mining platform—OdinText—across three of the most important texts in human history: The Old Testament, The New Testament and the Quran.

Author’s note: For more details about the data sources and methodology, please see Part I of this series.

The project was inspired by the ongoing public debate around whether or not terrorism connected with Islamic fundamentalism reflects something inherently and distinctly violent about Islam compared to other major religions.

Before sharing the first set of results with you here today, due to the sensitive nature of this topic, I feel obliged to reiterate that this analysis represents only a cursory, superficial view of just the texts, themselves. It is in no way intended to advance any agenda or to conclusively prove anyone’s point.

Part III – Violence, Mercy and Non-Believers – to appear soon.

A comparison that may be an inducement for some to learn text/sentiment analysis but I would view its results with a great deal of caution.

Two of the comments to the first post read:

(comment) If you’re not completing the analysis in the native language, you’re just analyzing the translators’ understanding and interpretation of the texts; this is very different than the actual texts.

(to which a computational linguist replies) Technically, that is certainly true. However, if you are looking at broad categories of sentiment or topic, as this analysis does, there should be little variation in the results between translations, or by using the original. As well, it could be argued that what is most of interest is the viewpoint of the interpreters of the text, hence the translations may be *more* of interest, to some extent. But I would not expect that this analysis would be very sensitive at all to variations in translation or even language.

I find the position taken by the computational linguist almost incomprehensible.

Not only do we lack anything approaching a full social context for any of the texts in their original languages, moreover, terms that occur once (hapaxes) number approximately 1,300 in the Hebrew Bible and over 3,500 in the New Testament. For a discussion of the Qur’ān, see: Hapaxes in the Qur’ān: identifying and cataloguing lone words (and loadwords) by Shawkat M. Toorawa. Toorawa includes a list of hapaxes for the Qur’ān, a discussion of why they are important and a comparison to other texts.

Here is a quick example of where social context can change how you read a text:

23 The priest is to write these curses on a scroll and then wash them off into the bitter water. 24 He shall have the woman drink the bitter water that brings a curse, and this water will enter her and cause bitter suffering. 25 The priest is to take from her hands the grain offering for jealousy, wave it before the LORD and bring it to the altar. 26 The priest is then to take a handful of the grain offering as a memorial offering and burn it on the altar; after that, he is to have the woman drink the water. 27 If she has defiled herself and been unfaithful to her husband, then when she is made to drink the water that brings a curse, it will go into her and cause bitter suffering; her abdomen will swell and her thigh waste away, and she will become accursed among her people. (Numbers 5:23-27)

Does that sound sexist to you?

Interesting because a Hebrew Bible professor of my argued that it is one of the earliest pro-women passages in the text.

Think about the social context. There are no police, no domestic courts, short of retribution from the wife’s family members, there are no constraints on what a husband can do to his wife. Even killing her wasn’t beyond the pale.

Given that context, setting up a test that no one can fail, in the presence of a priest, which also deters resorting to a violent remedy, sounds like it gets the wife out of a dangerous situation where the priest can say: “See, you were jealous for no reason, etc.”

There’s no guarantee that is the correct interpretation either but it does accord with present understandings of law and custom at the time. The preservation of order in the community, no mean thing in the absence of an organized police force, was an important thing.

The English words used in translations also have their own context, which may be resolved differently from those in the original languages.

As I said, interesting but consider with a great deal of caution.

### Querying Biblical Texts: Part 1 [Humanists Take Note!]

Saturday, November 14th, 2015

From the post:

This is the first in a series on querying Greek texts with XQuery. We will also look at the differences among various representations of the same text, starting with the base text, morphology, and three different treebank formats. As we will see, the representation of a text indicates what the producer of the text was most interested in, and it determines the structure and power of queries done on that particular representation. The principles discussed here also apply to other languages.

This is written as a tutorial, and it can be read in two ways. The first time through, you may want to simply read the text. If you want to really learn how to do this yourself, you should download an XQuery processor and some data (in your favorite biblical language) and try these queries and variations on them.

Humanists need to follow this series and pass it along to others.

Texts of interest to you will vary but the steps Jonathan covers are applicable to all texts (well, depending upon your encoding).

In exchange for learning a little XQuery, you can gain a good degree of mastery over XML encoded texts.

Enjoy!

### Praise For Conservative Bible Translations

Wednesday, September 9th, 2015

I don’t often read praise for conservative Bible translations but conservative Bible translations can have unexpected uses:

Anders Søgaard and his colleagues from the project LOWLANDS: Parsing Low-Resource Languages and Domains are utilising the texts which were annotated for big languages to develop language technology for smaller languages, the key to which is to find translated texts so that the researchers can transfer knowledge of one language’s grammar onto another language:

“The Bible has been translated into more than 1,500 languages, even the smallest and most ‘exotic’ ones, and the translations are extremely conservative; the verses have a completely uniform structure across the many different languages which means that we can make suitable computer models of even very small languages where we only have a couple of hundred pages of biblical text,” Anders Søgaard says and elaborates:

“We teach the machines to register what is translated with what in the different translations of biblical texts, which makes it possible to find so many similarities between the annotated and unannotated texts that we can produce exact computer models of 100 different languages — languages such as Swahili, Wolof and Xhosa that are spoken in Nigeria. And we have made these models available for other developers and researchers. This means that we will be able to develop language technology resources for these languages similar to those which speakers of languages such as English and French have.”

Anders Søgaard and his colleagues have recently presented their results in the article ‘”If you all you have is a bit of the Bible” at the conference Annual Meeting of the Association of Computational Linguistics.

The abstract for the paper: If all you have is a bit of the Bible: Learning POS taggers for truly low-resource languages reads:

We present a simple method for learning part-of-speech taggers for languages like Akawaio, Aukan, or Cakchiquel – languages for which nothing but a translation of parts of the Bible exists. By aggregating over the tags from a few annotated languages and spreading them via word-alignment on the verses, we learn POS taggers for 100 languages, using the languages to bootstrap each other. We evaluate our cross-lingual models on the 25 languages where test sets exist, as well as on another 10 for which we have tag dictionaries. Our approach performs much better (20-30%) than state-of-the-art unsupervised POS taggers induced from Bible translations, and is often competitive with weakly supervised approaches that assume high-quality parallel corpora, representative monolingual corpora with perfect tokenization, and/or tag dictionaries. We make models for all 100 languages available.

All of the resources used in this project, along with their models, can be found at: https://bitbucket.org/lowlands/

Don’t forget conservative Bible translations if you are doing linguistic models.

### New Testament Virtual Manuscript Room

Tuesday, June 2nd, 2015

New Testament Virtual Manuscript Room

From the webpage:

This site is devoted to the study of Greek New Testament manuscripts. The New Testament Virtual Manuscript Room is a place where scholars can come to find the most exhaustive list of New Testament manuscript resources, can contribute to marking attributes about these manuscripts, and can find state of the art tools for researching this rich dataset.

While our tools are reasonably functional for anonymous users, they provide additional features and save options once a user has created an account and is logged in on the site. For example, registered users can save transcribed pages to their personal account and create personalized annotations to images.

A close friend has been working on this project for several years. Quite remarkable although I would prefer it to feature Hebrew (and older) texts. 😉

### New Testament Transcription

Thursday, July 24th, 2014

There is an excellent example of a transcription interface at: http://ancientlives.org/tutorial/transcribe. A screen shot won’t display well but I can sketch the general form of the interface:

A user selects a character in the papyrus by “clicking” on its center point. That point can be moved if need be. The character will be highlighted and you then select the matching character on the keyboard.

There are examples of the instructions that can be played if you are uncertain at any point.

I can’t imagine a more intuitive transcription interface.

I have suggested crowd sourcing transcription of the New Testament (and Old Testament/Hebrew Bible as well) before to groups concerned with those texts. The response has always been that there are cases that require expertise to transcribe. Fair enough, that’s very true.

But, with crowd transcription, we would be able to use the results of hundreds if not thousands of transcribers to identify the characters or symbols that have no consistent transcription. Those particular cases could be “kicked up stairs” to the experts.

The end result, assuming access to all the extant manuscripts, would be a traceable transcription of all the sources for the New Testament back to particular manuscripts or papyri. With all the witnesses to a particular character or word being at the reader’s fingertips. (Ditto for the Old Testament/Hebrew Bible.)

We have the technology to bring the witnesses to the biblical text to everyone who is interested. The only remaining question is whether funders can overcome the reluctance of the usual suspects to granting everyone that level of access.

Personally I have no fear of free and open access to the witnesses to the biblical text. As a text the Bible has resisted efforts to pervert it meaning for more than two (2) thousand years. It can take care of itself.

### Thirty-three Greek Biblical manuscripts added to Digitised Manuscripts

Saturday, July 12th, 2014

Thirty-three Greek Biblical manuscripts added to Digitised Manuscripts by Cillian O’Hogan.

From the post:

The third phase of the British Library's Greek Manuscripts Digitisation Project is now well underway. So far, the following items, all Greek biblical items, have been added to Digitised Manuscripts. We will continue to update the blog with new additions over the course of the year, and will also look at some individual manuscripts in more detail in later posts. We are extremely grateful to the foundations and individuals who have funded this project, especially the Stavros Niarchos Foundation, the A. G. Leventis Foundation, Sam Fogg, the Sylvia Ioannou Foundation and the Thriplow Charitable Trust.

Add MS 24112, Four Gospels in Greek (Gregory-Aland 694; Scrivener evan. 598; von Soden ε 502), written throughout with space for a Latin translation, which has been added for a small number of verses. 15th century, possibly Italy.

Add MS 24373, Four Gospels (Gregory-Aland 695; Scrivener evan. 599; von Soden ε 327), with illuminated Evangelist portraits. 13th century. Also online is an old 19th-century binding for this manuscript.

Add MS 24374, Fragments from a Gospel Lectionary with ekphonetic notation (Gregory-Aland l 325; Scrivener evst. 273). 13th century.

Add MS 24376, Four Gospels (Gregory-Aland 696; Scrivener evan. 600; von Soden ε 328), with illuminated Evangelist portraits (St Mark illustrated above). 14th century (illuminations added in the 16th century), Constantinople.

Add MS 24377, Gospel Lectionary (Gregory-Aland l 326; Scrivener evst. 274), with ekphonetic notation, imperfect. 2nd half of the 12th century, possibly from the Monastery of Patir in southern Italy.

Add MS 24378, Menaion for September, October, November, December, January and February (Gregory-Aland l 927; Scrivener evst. 275). 13th/14th century.

Add MS 24379, Gospel Lectionary (Gregory-Aland l 327; Scrivener evst. 276), imperfect. 14th century.

Add MS 24380, Gospel Lectionary (Gregory-Aland l 328; Scrivener evst. 277), with ekphonetic notation, imperfect. 14th century.

Add MS 27860, Gospel Lectionary (Gregory-Aland l 329; Scrivener evst. 278), imperfect at the beginning, with marginal decorations thruoghout. Late 10th/early 11th century, Southern Italy (possibly Capua). Also online is an old 17th-century binding for this manuscript.

Add MS 27861, Gospels (Gregory-Aland e 698; Scrivener evan 602; von Soden ε 436), imperfect (lacking Matthew). 14th century.

Add MS 28815, New Testament, imperfect (Gregory-Aland 699; Scrivener evst. 603; von Soden δ 104), with Evangelist portraits and a silver-gilt plated cover. Mid-10th century, Constantinople. The subject of a recent blog post along with Egerton 3145.

Add MS 28816, New Testament, from Acts onwards (Gregory-Aland 203; Scrivener act. 232; von Soden α 203), with Euthalian apparatus, and other works. Written between 1108 and 1111 by the monk Andreas in March 1111, in the cell of the monk Meletius in the monastery of the Saviour.

Add MS 28818, Gospel Lectionary (Gregory-Aland l 331; Scrivener evst. 280). 1272, written by the monk Metaxares.

Add MS 29713, Gospel Lectionary (Gregory-Aland l 332; Scrivener evst. 62), imperfect at the beginning. 14th century.

Add MS 31208, Gospel Lectionary with ekphonetic notation (Gregory-Aland l 333; Scrivener evst *281), imperfect. 13th century, possibly Constantinople.

Add MS 31920, Gospel Lectionary (Gregory-Aland l 335; Scrivener evst 283), imperfect and mutilated. 12th century, South Italy (possibly Reggio).

Add MS 32051, Lectionary of the Acts and Epistles, imperfect, with ekphonetic notation (Gregory-Aland l 169; Scrivener apost. 52). 13th century.

Add MS 32341, Four Gospels (Gregory-Aland 494; Scrivener evan. 325; von Soden ε 437), imperfect. 14th century.

Add MS 33214, New Testament: Acts and Epistles (Gregory-Aland 1765; von Soden α 486). 14th century.

Add MS 33277, Four Gospels (Gregory-Aland 892; von Soden ε 1016; Scrivener evan. 892). 9th century, with replacement leaves added in the 13th and 16th centuries.

Add MS 34108, Four Gospels (Gregory-Aland 1280; Scrivener evan. 322; von Soden ε 1319). 12th century, with some replacement leaves added in the 15th century.

Add MS 34602, Fragments from two Psalters (Rahlfs-Fraenkel 2017, 1217) (illustrated above). 7th century and 10th century, Egypt.

Add MS 36751, Gospel Lectionary with ekphonetic neumes, called ἐκλογάδι(ον) (Gregory-Aland l 1491). Completed in 1008 at the Holy Monastery of Iviron, Mount Athos, by the scribe Theophanes.

Add MS 36752, Four Gospels (Gregory-Aland 2280). 12th century.

Add MS 37005, Gospel Lectionary (Gregory-Aland l 1493). 11th century.

Add MS 37006, Gospel Lectionary with ekphonetic neumes (Gregory-Aland l 1494 [=l 460]). 12th century, with late 13th-century replacements, including a full-page miniature of Christ and a figure identified as Andronicus II Palaeologus (Byzantine emperor 1282-1328) (illustrated above).

Add MS 38538, New Testament, Acts and Epistles (Gregory-Aland 2484), with Euthalian apparatus. Written by the scribe John in 1312

Add MS 39589, Psalter (Rahlfs 1092) with introduction and commentary based on that of Euthymius Zigabenus (PG 128), attributed in the manuscript to Nicephorus Blemmydes, imperfect, with ornamental headpieces and the remains of a miniature of the Psalmist. 2nd half of the 12th century.

Add MS 39590, New Testament, without the book of Revelation (Gregory-Aland 547; Scrivener evan. 534; von Soden δ 157). 11th century.

Add MS 39593, Four Gospels (Gregory-Aland 550; Scrivener evan. 537; von Soden ε 250), with prefaces taken from the commentary of Theophylact, and synaxaria. 12th century.

Add MS 39612, Revelation (Gregory-Aland 2041; Scrivener apoc. 96; von Soden α1475). The quire-numbers on ff 1v and 10v show the manuscript formed part of a larger volume, possibly Athos, Karakallou 121 (268) (Gregory-Aland 1040). 14th century, possibly Mount Athos.

Add MS 39623, Fragments from a Gospel Lectionary (Gregory-Aland l 1742). Late 14th century, possibly Mount Athos.

Egerton MS 3145, Epistles and Revelation (Gregory-Aland 699; Scrivener paul. 266; von Soden δ 104), concluding portion of the manuscript of the entire New Testament of which Add. MS 28815 is the earlier portion. Mid-10th century, Constantinople. Also online is an old (18th century?) binding for this manuscript.

I know a number of scholars who will be happy to learn of this latest batch of NT manuscripts. (I am awaiting similar projects with 3-D imaging of cuneiform.)

Cillian O’Hogan is fortunate to work at an institution that fosters scholarship, biblical and otherwise.

### Codex Sinaiticus Added to Digitised Manuscripts

Tuesday, March 25th, 2014

Codex Sinaiticus Added to Digitised Manuscripts by Julian Harrison.

From the post (I have omitted the images, see the original post for those):

Codex Sinaiticus is one of the great treasures of the British Library. Written in the mid-4th century in the Eastern Mediterranean (possibly at Caesarea), it is one of the two oldest surviving copies of the Greek Bible, along with Codex Vaticanus, in Rome. Written in four narrow columns to the page (aside from in the Poetic books, in two columns), its visual appearance is particularly striking.

The significance of Codex Sinaiticus for the text of the New Testament is incalculable, not least because of the many thousands of corrections made to the manuscript between the 4th and 12th centuries.

The manuscript itself is now distributed between four institutions: the British Library, the Universitäts-Bibliothek at Leipzig, the National Library of Russia in St Petersburg, and the Monastery of St Catherine at Mt Sinai. Several years ago, these four institutions came together to collaborate on the Codex Sinaiticus Project, which resulted in full digital coverage and transcription of all extant parts of the manuscript. The fruits of these labours, along with many additional essays and scholarly resources, can be found on the Codex Sinaiticus website.

The British Library owns the vast majority of Codex Sinaiticus and only the British Library portion is being released as part of the Digitised Manuscripts project.

The world in which biblical scholarship is done has changed radically over the last 20 years.

This effort by the British Library should be applauded and supported.

### Dead Sea Scrolls Updated!

Wednesday, February 5th, 2014

Well, actually not! 😉 but the Leon Levy Dead Sea Scrolls Digital Library has been upgraded!

From their Facebook page:

A second, upgraded version of the Leon Levy Dead Sea Scrolls Digital Library was launched today. Visitors to the new website (www.deadseascrolls.org.il) will be able to view and explore 10,000 newly uploaded images of unprecedented quality. The website also offers accompanying explanations pertaining to a variety of manuscripts, such as the book of Exodus written in paleo-Hebrew script, the books of Samuel, the Temple Scroll, Songs of Shabbat Sacrifice, and New Jerusalem.

The upgraded website comprises many improvements: 10,000 new multispectral images, improved metadata, additional manuscript descriptions, content pages translated into Russian and German in addition to the current languages, a faster search engine, easy access from the site to the facebook page and to twitter and more.

Imagine that! 10,000 new images.

Pass this on to academic publishers worried about control over works they can’t give away.

The ranks of the “frightened by public access” to non-commercial content are growing thinner.

### Ancient texts published online…

Friday, December 13th, 2013

Ancient texts published online by the Bodleian and the Vatican Libraries

From the post:

The Bodleian Libraries of the University of Oxford and the Biblioteca Apostolica Vaticana (BAV) have digitized and made available online some of the world’s most unique and important Bibles and biblical texts from their collections, as the start of a major digitization initiative undertaken by the two institutions.

The digitized texts can be accessed on a dedicated website which has been launched today (http://bav.bodleian.ox.ac.uk). This is the first launch of digitized content in a major four-year collaborative project.
Portions of the Bodleian and Vatican Libraries’ collections of Hebrew manuscripts, Greek manuscripts, and early printed books have been selected for digitization by a team of scholars and curators from around the world. The selection process has been informed by a balance of scholarly and practical concerns; conservation staff at the Bodleian and Vatican Libraries have worked with curators to assess not only the significance of the content, but the physical condition of the items. While the Vatican and the Bodleian have each been creating digital images from their collections for a number of years, this project has provided an opportunity for both libraries to increase the scale and pace with which they can digitize their most significant collections, whilst taking great care not to expose books to any damage, as they are often fragile due to their age and condition.

The newly-launched website features zoomable images which enable detailed scholarly analysis and study. The website also includes essays and a number of video presentations made by scholars and supporters of the digitization project including the Archbishop of Canterbury and Archbishop Jean-Louis Bruguès, o.p. The website blog will also feature articles on the conservation and digitized techniques and methods used during the project. The website is available both in English and Italian.

Originally announced in April 2012, the four-year collaboration aims to open up the two libraries’ collections of ancient texts and to make a selection of remarkable treasures freely available online to researchers and the general public worldwide. Through the generous support of the Polonsky Foundation, this project will make 1.5 million digitized pages freely available over the next three years.

Only twenty-one (21) works up now but 1.5 million pages by the end of the project. This is going to be a treasure trove without end!

Associating these items with their cultural contexts of production, influence on other works, textual history, comments by subsequent works, across multiple languages, is a perfect fit for topic maps.

Kudos to both the Bodleian and the Vatican Libraries!

### Greek New Testament (with syntax trees)

Monday, August 12th, 2013

Greek New Testament (with syntax trees)

If you are tired of the same old practice data sets, I may have a treat for you!

The Asia Bible Society has produced syntax tress for the New Testament, using the SBL Greek New Testament text.

To give you an idea of the granularity of the data, the first sentence in Matthew is spread over forty-nine (49) lines of markup.

Not big data in the usual sense but important data.

### bibleQuran: Comparing the Word Frequency between Bible and Quran

Friday, November 4th, 2011

bibleQuran: Comparing the Word Frequency between Bible and Quran

From the post:

bibleQuran [pitchinteractive.com] by datavis design firm Pitch Interactive reveals the frequency of word usage between two of the most important holy books: the Bible and the Quran.

The densely populated interactive visualization allows people to search for any word (and similar variations of that word) to explore its frequency in both texts. As each verse is always visible, one is able to compare the relative density of ideas and topics between both passages. For instance, one could select verbs that represent acts of ‘terror’ or ‘love’, and investigate which book discusses the topics more. The appropriate little rectangles, each representing an according verse, which include such this chosen word, are then highlighted, and can be read in detail by hovering the mouse over them.

In addition to being a great graphic presentation of information, with my background and appreciation for both texts, you know why I had to include this post.

I like the synonym feature, although I reserve judgment on what is considered a synonym. 😉 I would have to read the original. Translations of both texts are, well, translations. Not really the same text in a very real sense of the word.

Just as a suggestion, I would do the word count statistics separately for the Old/New Testament.

Word of warning: Loads great with Firefox (7.1) on Windows XP, doesn’t load with IE 8 on Windows XP, doesn’t load with Firefox (3.6) on Ubuntu 10.04. So, your experience may vary.

Comments from users with other browser/OS combinations?