Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

September 24, 2017

Behind the First Arab Data Journalists’ Network – Open For Collaborations!

Filed under: Arabic,Journalism,News,Reporting — Patrick Durusau @ 8:23 pm

Behind the First Arab Data Journalists’ Network

From the post:

When it comes to data journalism in the Middle East, one name stands out. Amr Eleraqi is the data journalist spreading data journalism to the Middle East. In 2012, he launched infotimes.org, the first Arabic website specializing in data journalism in the region. Since then, Eleraqi and his organization have both been nominated for GEN Data Journalism Awards — once in 2015 as an individual, and the second in 2016 for the best data visualization website of the year.

His goal: to introduce Arab journalists to the concept of data visualization as a new tool for storytelling. It worked. As the site grew so did the interest of Arab journalists in the field of data journalism. So he and a team of nine recently launched the first Arab Data Journalists’ Network. Advocacy Assembly spoke with Eleraqi to learn more about the network and how it’s changing the scene for Arab journalists.

The website, Arab Data Journalists’ Network, is available in three languages (Arabic, English, French) and is focused on educational material for Arab journalists in Arabic.

The tweet from @gijn where I saw this says contact @arabdjn or @aeleraqi for collaborations!

Excellent opportunity to expand your news awareness and data journalism contacts.

July 11, 2017

Open Islamicate Texts Initiative (OpenITI)

Filed under: Arabic,Islam,Literature,Text Corpus,Texts — Patrick Durusau @ 4:37 pm

Open Islamicate Texts Initiative (OpenITI)

From the description (Annotation) of the project:

Books are grouped into authors. All authors are grouped into 25 AH periods, based on the year of their death. These repositories are the main working loci—if any modifications are to be added or made to texts or metadata, all has to be done in files in these folders.

There are three types of text repositories:

  • RAWrabicaXXXXXX repositories include raw texts as they were collected from various open-access online repositories and libraries. These texts are in their initial (raw) format and require reformatting and further integration into OpenITI. The overall current number of text files is over 40,000; slightly over 7,000 have been integrated into OpenITI.
  • XXXXAH are the main working folders that include integrated texts (all coming from collections included into RAWrabicaXXXXXX repositories).
  • i.xxxxx repositories are instantiations of the OpenITI corpus adapted for specific forms of analysis. At the moment, these include the following instantiations (in progress):
    • i.cex with all texts split mechanically into 300 word units, converted into cex format.
    • i.mech with all texts split mechanically into 300 word units.
    • i.logic with all texts split into logical units (chapters, sections, etc.); only tagged texts are included here (~130 texts at the moment).
    • i.passim_new_mech with all texts split mechanically into 300 word units, converted for the use with new passim (JSON).
    • [not created yet] i.passim_new_mech_cluster with all text split mechanically into 900 word units (3 milestones) with 300 word overlap; converted for the use with new passim (JSON).
    • i.passim_old_mech with all texts split mechanically into 300 word units, converted for the use with old passim (XML, gzipped).
    • i.stylo includes all texts from OpenITI (duplicates excluded) that are renamed and slightly reformatted (Arabic orthography is simplified) for the use with stylo R-package.

A project/site to join to hone your Arabic NLP and reading skills.

Enjoy!

Powered by WordPress