Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

December 15, 2014

Shining a light into the BBC Radio archives

Filed under: Archives,Audio,Auto Tagging,BBC,British Library,British Museum,Radio — Patrick Durusau @ 9:23 am

Shining a light into the BBC Radio archives by Yves Raimond, Matt Hynes, and Rob Cooper.

From the post:

comma

One of the biggest challenges for the BBC Archive is how to open up our enormous collection of radio programmes. As we’ve been broadcasting since 1922 we’ve got an archive of almost 100 years of audio recordings, representing a unique cultural and historical resource.

But the big problem is how to make it searchable. Many of the programmes have little or no meta-data, and the whole collection is far too large to process through human efforts alone.

Help is at hand. Over the last five years or so, technologies such as automated speech recognition, speaker identification and automated tagging have reached a level of accuracy where we can start to get impressive results for the right type of audio. By automatically analysing sound files and making informed decisions about the content and speakers, these tools can effectively help to fill in the missing gaps in our archive’s meta-data.

The Kiwi set of speech processing algorithms

COMMA is built on a set of speech processing algorithms called Kiwi. Back in 2011, BBC R&D were given access to a very large speech radio archive, the BBC World Service archive, which at the time had very little meta-data. In order to build our prototype around this archive we developed a number of speech processing algorithms, reusing open-source building blocks where possible. We then built the following workflow out of these algorithms:

  • Speaker segmentation, identification and gender detection (using LIUM diarization toolkitdiarize-jruby and ruby-lsh). This process is also known as diarisation. Essentially an audio file is automatically divided into segments according to the identity of the speaker. The algorithm can show us who is speaking and at what point in the sound clip.
  • Speech-to-text for the detected speech segments (using CMU Sphinx). At this point the spoken audio is translated as accurately as possible into readable text. This algorithm uses models built from a wide range of BBC data.
  • Automated tagging with DBpedia identifiers. DBpedia is a large database holding structured data extracted from Wikipedia. The automatic tagging process creates the searchable meta-data that ultimately allows us to access the archives much more easily. This process uses a tool we developed called ‘Mango’.

,,,

COMMA is due to launch some time in April 2015. If you’d like to be kept informed of our progress you can sign up for occasional email updates here. We’re also looking for early adopters to test the platform, so please contact us if you’re a cultural institution, media company or business that has large audio data-set you want to make searchable.

This article was written by Yves Raimond (lead engineer, BBC R&D), Matt Hynes (senior software engineer, BBC R&D) and Rob Cooper (development producer, BBC R&D)

I don’t have a large audio data-set but I am certainly going to be following this project. The results should be useful in and of themselves, to say nothing of being a good starting point for further tagging. I wonder if the BBC Sanskrit broadcasts are going to be available? I will have to check on that.

Without diminishing the achievements of other institutions, the efforts of the BBC, the British Library, and the British Museum are truly remarkable.

I first saw this in a tweet by Mike Jones.

November 16, 2014

Greek Digitisation Project Update: 40 Manuscripts Newly Uploaded

Filed under: British Library,Manuscripts — Patrick Durusau @ 7:43 pm

Greek Digitisation Project Update: 40 Manuscripts Newly Uploaded by Sarah J Biggs.

From the post:

We have now passed the half-way point of this phase of the Greek Manuscripts Digitisation Project, generously funded by the Stavros Niarchos Foundation and many others, including the A. G. Leventis Foundation, Sam Fogg, the Sylvia Ioannou Foundation, the Thriplow Charitable Trust, and the Friends of the British Library. What treasures are in store for you this month? To begin with, there are quite a few interesting 17th- and 18th-century items to look at, including two very fine 18th-century charters, with seals intact, an iconographic sketch-book (Add MS 43868), and a fascinating Greek translation of an account of the siege of Vienna in 1683 (Add MS 38890). We continue to upload some really exciting Greek bindings – of particular note here are Add MS 24372 and Add MS 36823. A number of scrolls have also been uploaded, mostly containing the Liturgy of Basil of Caesarea. A number of Biblical manuscripts are included, too, but this month two manuscripts of classical authors take pride of place: Harley MS 5600, a stunning manuscript of the Iliad from 15th-century Florence, and Burney MS 111, a lavishly decorated copy of Ptolemy’s Geographia.

Additional riches from the British Library!

Enjoy!

November 1, 2014

Guess the Manuscript XVI

Filed under: British Library,Image Processing,Image Recognition,Image Understanding — Patrick Durusau @ 7:55 pm

Guess the Manuscript XVI

From the post:

Welcome to the sixteenth instalment of our popular Guess the Manuscript series. The rules are simple: we post an image of part of a manuscript that is on the British Library’s Digitised Manuscripts site, you guess which one it’s taken from!

bl mss XVI

Are you as surprised as we are to find an umbrella in a medieval manuscript? The manuscript from which this image was taken will feature in a blogpost in the near future.

In the meantime, answers or guesses please in the comments below, or via Twitter @BLMedieval.

Caution! The Medieval Period lasted from five hundred (500) C.E. until fifteen hundred (1500) C.E. Google NGrams records the first use of “umbrella” at or around sixteen-sixty (1660). Is this an “umbrella” or something else?

Using Google’s reverse image search found only repostings of the image search challenge, no similar images. Not sure that helps but was worth a try.

On the bright side, there are only two hundred and fifty-seven (257) manuscripts in the digitized collection dated between five hundred (500) C.E. until fifteen hundred (1500) C.E.

What stories or information can be found in those volumes that might be accompanied by such an image? Need to create a list of the classes of those manuscripts.

Suggestions? Is there an image processor in the house?

Enjoy!

October 18, 2014

Another Greek update: Forty-six more manuscripts online!

Filed under: British Library,Manuscripts — Patrick Durusau @ 8:25 pm

Another Greek update: Forty-six more manuscripts online! by Sarah J. Biggs.

From the post:

It’s time for a monthly progress report on our Greek Manuscripts Digitisation Project, generously funded by the Stavros Niarchos Foundation and many others, including the A. G. Leventis Foundation, Sam Fogg, the Sylvia Ioannou Foundation, the Thriplow Charitable Trust, and the Friends of the British Library. There are some very exciting items in this batch, most notably the famous Codex Crippsianus(Burney MS 95), the most important manuscript for the text of the Minor Attic Orators; Egerton MS 942, a very fine copy of Demosthenes; a 19th-century poem and prose narrative on the Greek Revolution (Add MS 35072); a number of collections of 16th- and 17th-century complimentary verses in Greek and Latin dedicated to members of the Royal Family; and an exciting array of classical and patristic texts.

Texts that helped to shape the world we experience today. As did others but Greek texts played a special role in European history.

You can find ways to support the Greek Digitization project here.

I prefer, ahem, other material and for that you can consult:

The Latest, Greatest, Up-To-Datest Giant List of Digitised Manuscripts Hyperlinks.

Which list 1111 (eleventy-one-one?) manuscripts. Quite impressive.

Do consider supporting the British Library in this project and others. Some profess interest in sharing our common heritage. The British Library is sharing our common heritage. Your choice.

September 14, 2014

Forty-four More Greek Manuscripts Online

Filed under: British Library,Manuscripts — Patrick Durusau @ 3:49 pm

Forty-four More Greek Manuscripts Online by James Freeman.

From the post:

We are delighted to announce another forty-four Greek manuscripts have been digitised. As always, we are most grateful to the Stavros Niarchos Foundation, the A. G. Leventis Foundation, Sam Fogg, the Sylvia Ioannou Foundation, the Thriplow Charitable Trust, the Friends of the British Library, and our other generous benefactors for contributing to the digitisation project. Happy exploring!

A random sampling:

Add MS 31921, Gospel Lectionary with ekphonetic notation (Gregory-Aland l 336), imperfect, 12th century, with some leaves supplied in the 14th century. Formerly in Blenheim Palace Library.

Add MS 34059, Gospel Lectionary (Gregory-Aland l 939), with ekphonetic neumes. 12th century.,

Add MS 36660, Old Testament lectionary with ekphonetic notation, and fragments from a New Testament lectionary (Gregory-Aland l 1490). 12th century.

Add MS 37320, Four Gospels (Gregory-Aland 2290). 10th century, with additions from the 16th-17th century.

….

Burney MS 106, Sophocles, Ajax, Electra, Oedipus Tyrannus, Antigone; [Aeschylus], Prometheus Vinctus; Pindar, Olympia. End of the 15th century.

Burney MS 108, Aelian, Tactica; Leo VI, Tactica; Heron of Alexandria, Pneumatica, De automatis, with numerous diagrams. 1st quarter of the 16th century, possibly written at Venice.

Burney MS 109, Works by Theocritus, Hesiod, Pindar, Pythagoras and Aratus. 2nd half of the 14th century, Italy.

And many more!

Given the complex histories of the texts witnessed by these Greek manuscripts, their interpretations and commentaries, to say nothing of the history of the manuscripts per se, they are rich subjects that merit treatment with a topic map.

Be sure to visit the other treasures of the British Library. It is an exemplar of how an academic institution should function.

May 29, 2014

Discovering Literature: Romantics and Victorians

Filed under: British Library,Literature — Patrick Durusau @ 2:26 pm

Discovering Literature: Romantics and Victorians (British Library)

From “About this project:”

Exploring the Romantic and Victorian periods, Discovering Literature brings together, for the first time, a wealth of the British Library’s greatest literary treasures, including numerous original manuscripts, first editions and rare illustrations.

A rich variety of contextual material – newspapers, photographs, advertisements and maps – is presented alongside personal letters and diaries from iconic authors. Together they bring to life the historical, political and cultural contexts in which major works were written: works that have shaped our literary heritage.

William Blake’s notebook, childhood writings of the Brontë sisters, the manuscript of the Preface to Charles Dickens’s Oliver Twist, and an early draft of Oscar Wilde’s The Importance of Being Earnest are just some of the unique collections available on the site.

Discovering Literature features over 8000 pages of collection items and explores more than 20 authors through 165 newly-commissioned articles, 25 short documentary films, and 30 lesson plans. More than 60 experts have contributed interpretation, enriching the website with contemporary research. Designed to enhance the study and enjoyment of English literature, the site contains a dedicated Teachers’ Area supporting the curriculum for GCSE and A Level students.

These great works from the Romantic and Victorian periods form the first phase of a wider project to digitise other literary eras, including the 20th century.

On a whim I searched for Bleak House only to find: Bleak House first edition with illustrations, which includes images of the illustrations and the text. Moreover, it has related links, one of which is a review of Jude the Obscure that appeared in the Morning Post.

From the review:

To write a story of over five hundred pages, and longer by far than the majority of three-volume novels, without allowing one single ray of humour, or even cheerfulness, to dispel for a moment the gloomy atmosphere of hopeless pessimism was no ordinary task, and might have taxed the powers of the most relentless observers of life. Even Euripides, had he been given to the writing of novels, might well have faltered before such a tremendous undertaking.

Can you imagine finding such a review on Amazon.com?

Mapping Bleak House into then current legal practice or Jude the Obscure into social customs and records of the time would be fascinating summer projects.

April 20, 2014

The Next Giant List of Digitised Manuscript Hyperlinks

Filed under: British Library,Manuscripts — Patrick Durusau @ 10:50 am

The Next Giant List of Digitised Manuscript Hyperlinks by Sarah J. Biggs.

From the post:

It’s that time of year again, friends – when we inflict our quarterly massive list of manuscript hyperlinks upon an unsuspecting public. As always, this list contains everything that has been digitised up to this point by the Medieval and Earlier Manuscripts department, complete with hyperlinks to each record on our Digitised Manuscripts site. There will be another updated list here on the blog in three months; you can download the current version here: Download BL Medieval and Earlier Digitised Manuscripts Master List 10.04.13. Have fun!

The listing has reached one of my favorites: Yates Thompson MS 36, also known as: Dante Alighieri, Divina commedia. Publication date proposed to be after 1444. (Warning: Do not view with Chrome. Warns of a “redirect loop.” Displays fine with Firefox.)

Great description of the manuscript plus three hundred and ninety-nine (399) images.

But it does seem to just lay there doesn’t it?

Suggestions?

March 25, 2014

Codex Sinaiticus Added to Digitised Manuscripts

Filed under: Bible,British Library,Library,Manuscripts — Patrick Durusau @ 2:45 pm

Codex Sinaiticus Added to Digitised Manuscripts by Julian Harrison.

From the post (I have omitted the images, see the original post for those):

Codex Sinaiticus is one of the great treasures of the British Library. Written in the mid-4th century in the Eastern Mediterranean (possibly at Caesarea), it is one of the two oldest surviving copies of the Greek Bible, along with Codex Vaticanus, in Rome. Written in four narrow columns to the page (aside from in the Poetic books, in two columns), its visual appearance is particularly striking.

The significance of Codex Sinaiticus for the text of the New Testament is incalculable, not least because of the many thousands of corrections made to the manuscript between the 4th and 12th centuries.

The manuscript itself is now distributed between four institutions: the British Library, the Universitäts-Bibliothek at Leipzig, the National Library of Russia in St Petersburg, and the Monastery of St Catherine at Mt Sinai. Several years ago, these four institutions came together to collaborate on the Codex Sinaiticus Project, which resulted in full digital coverage and transcription of all extant parts of the manuscript. The fruits of these labours, along with many additional essays and scholarly resources, can be found on the Codex Sinaiticus website.

The British Library owns the vast majority of Codex Sinaiticus and only the British Library portion is being released as part of the Digitised Manuscripts project.

The world in which biblical scholarship is done has changed radically over the last 20 years.

This effort by the British Library should be applauded and supported.

January 21, 2014

Yet Another Giant List of Digitised Manuscript Hyperlinks

Filed under: British Library,Digital Library,Manuscripts — Patrick Durusau @ 11:41 am

Yet Another Giant List of Digitised Manuscript Hyperlinks

From the post:

A new year, a newly-updated list of digitised manuscript hyperlinks! This master list contains everything that has been digitised up to this point by the Medieval and Earlier Manuscripts department, complete with hyperlinks to each record on our Digitised Manuscripts site. We’ll have another list for you in three months; you can download the current version here: Download BL Medieval and Earlier Digitised Manuscripts Master List 14.01.13. Have fun!

I count 921 digitized manuscripts, with more on the way!

A highly selective sampling:

That leaves 917 manuscripts for you to explore! With more on the way!

CAUTION! When I try to use Chrome on Ubuntu to access these links, I get: “This webpage has a redirect loop.” The same links work fine in Firefox. I have posted a comment about this issue to the post. Will update when I have more news. If your experience is same/different let me know. Just curious.

Enjoy!

PS:

Vote by midnight January 26, 2014 to promote the Medieval Manuscripts Blog.

Vote for Medieval Manuscripts Blog in the UK Blog Awards

Powered by WordPress