Archive for the ‘SKOS’ Category

Running Spark GraphX algorithms on Library of Congress subject heading SKOS

Monday, May 4th, 2015

Running Spark GraphX algorithms on Library of Congress subject heading SKOS by Bob Ducharme.

From the post:

Well, one algorithm, but a very cool one.

Last month, in Spark and SPARQL; RDF Graphs and GraphX, I described how Apache Spark has emerged as a more efficient alternative to MapReduce for distributing computing jobs across clusters. I also described how Spark’s GraphX library lets you do this kind of computing on graph data structures and how I had some ideas for using it with RDF data. My goal was to use RDF technology on GraphX data and vice versa to demonstrate how they could help each other, and I demonstrated the former with a Scala program that output some GraphX data as RDF and then showed some SPARQL queries to run on that RDF.

Today I’m demonstrating the latter by reading in a well-known RDF dataset and executing GraphX’s Connected Components algorithm on it. This algorithm collects nodes into groupings that connect to each other but not to any other nodes. In classic Big Data scenarios, this helps applications perform tasks such as the identification of subnetworks of people within larger networks, giving clues about which products or cat videos to suggest to those people based on what their friends liked.

As so typically happens when you are reading one Bob DuCharme post, you see another that one requires reading!

Bob covers storing RDF in RDD (Resilient Distributed Dataset), the basic Spark data structure, creating the report on connected components and ends with heavily commented code for his program.

Sadly the “related” values assigned by the Library of Congress don’t say how or why the values are related, such as:

“Hiding places”





Related values could be useful in some cases but if I am searching on “privacy,” as in the sense of being free from government intrusion, then “solitude,” “loneliness,” and “hiding places” aren’t likely to be helpful.

That’s not a problem with Spark or SKOS, but a limitation of the data being provided.

Want to see how #SchemaOrg #Dbpedia and #SKOS taxonomies can be seamlessly integrated?

Friday, September 12th, 2014

Want to see how #SchemaOrg #Dbpedia and #SKOS taxonomies can be seamlessly integrated? Register for our webinar:

is how the tweet read.

From the seminar registration page:

With the arrival of semantic web standards and linked data technologies, new options for smarter content management and semantic search have become available. Taxonomies and metadata management shall play a central role in your content management system: By combining text mining algorithms with taxonomies and knowledge graphs from the web a more accurate annotation and categorization of documents and more complex queries over text-oriented repositories like SharePoint, Drupal, or Confluence are now possible.

Nevertheless, the predominant opinion that taxonomy management is a tedious process currently impedes a widespread implementation of professional metadata strategies.

In this webinar, key people from the Semantic Web Company will describe how content management and collaboration systems like SharePoint, Drupal or Confluence can benefit from professional taxonomy management. We will also discuss why taxonomy management is not necessarily a tedious process when well integrated into content management workflows.

I’ve had mixed luck with webinars this year. Some were quite good and others were equally bad.

I have fairly firm opinions about, #Dbpedia and #SKOS taxonomies but tedium isn’t one of them. 😉

You can register for free for: Webinar “Taxonomy management & content management – well integrated!”, October 8th, 2014.

Usual marketing harvesting of contact information. Linux users will have to use VMs for PCs or Mac.

If you attend, be sure to look for my post reviewing the webinar and post your comments there.

Cross-Scheme Management in VocBench 2.1

Sunday, April 13th, 2014

Cross-Scheme Management in VocBench 2.1 by Armando Stellato.

From the post:

One of the main features of the forthcoming VB2.1 will be SKOS Cross-Scheme Management

I started drafting some notes about cross-scheme management here:

I think it is important to have all the integrity checks related to this aspect clear for humans, and not only have them sealed deep in the code. These notes will help users get acquainted with this feature in advance. Once completed, these will be included also in the manual of VB.

For the moment I’ve only written the introduction, some notes about data integrity and then described the checks carried upon the most dangerous operation: removing a concept from a scheme. Together with the VB development group, we will add more information in the next days. However, if you have some questions about this feature, you may post them here, as usual (or you may use the vocbench user/developer user groups).

A consistent set of operations and integrity checks for cross-scheme are already in place for this 2.1, which will be released in the next days.

VB2.2 will focus on other aspects (multi-project management), while we foresee a second wave of facilities for cross-scheme management (such as mass-move/add/remove actions, fixing utilities, analysis of dangling concepts, corrective actions etc..) for VB2.3

I agree that:

I think it is important to have all the integrity checks related to this aspect clear for humans, and not only have them sealed deep in the code.

But I am less certain that following the integrity checks of SKOS is useful in all mappings between schemes.

If you are interested in such constraints, see Armando’s notes.

Aligning Controlled Vocabularies

Tuesday, March 18th, 2014

Tutorial on the use of SILK for aligning controlled vocabularies

From the post:

A tutorial on the use of SILK has been published.The SILK framework is a tool for discovering relationships between data items within different Linked Data sources.This tutorial explains how SILK can be used to discover links between concepts in controlled vocabularies.

Example used in this Tutorial

The tutorial uses an example where SILK is used to create a mapping between the Named Authority Lists (NALs) of the Publications Office of the EU and the MARC countries list of the US Library of Congress. Both controlled vocabularies (NALs & MARC Countries list) use URIs to identify countires, compare for example, the following URIs for the country of Luxembourg

SILK represents mappings between NALs using the SKOS language (skos:exactMatch). In the case of the URIs for Luxembourg this is expressed as N-Triples:

The tutorial is here.

If you bother to look up the documentation on skos:exactMatch:

The property skos:exactMatch is used to link two concepts, indicating a high degree of confidence that the concepts can be used interchangeably across a wide range of information retrieval applications. skos:exactMatch is a transitive property, and is a sub-property of skos:closeMatch.

Are you happy with “…a high degree of confidence that the concepts can be used interchangeably across a wide range of information retrieval applications?”

I’m not really sure what that means?

Not to mention that if 97% of the people in a geographic region want a new government, some will say it can join a new country, but if the United States disagrees (for reasons best known to itself), then the will of 97% of the people is a violation of international law.

What? Too much democracy? I didn’t know that was a violation of international law.

If SKOS statements had some content, properties I suppose, along with authorship (and properties there as well), you could make an argument for skos:exactMatch being useful.

So far as I can see, it is not even a skos:closeMatch to “useful.”

Create better SKOS vocabularies

Tuesday, January 14th, 2014

Create better SKOS vocabularies

From the webpage:

PoolParty SKOS Quality Checker allows you to perform automated quality checks on controlled vocabularies. You will receive a report of our findings.

This service is based on qSKOS and is able to make checks on over 20 quality issues.

You will organize uploaded vocabularies by giving a name for which you may provide different versions of the same vocabulary. This way you can easily track quality improvements over time.

You won’t need this for simple vocabularies (think but could be useful for more complex vocabularies.

AGROVOC 2013 edition released

Monday, February 11th, 2013

AGROVOC 2013 edition released

From the post:

The AGROVOC Team is pleased to announce the release of the AGROVOC 2013 edition.

The updated version contains 32,188 concepts in up to 22 languages, resulting in a total of 626,211 terms (in 2012: 32,061 concepts, 625,096 terms).

Please explore AGROVOC by searching terms, or browsing hierarchies.

AGROVOC 2013 is available for download, and accessible via web services.

From the “about” page:

The AGROVOC thesaurus contains 32,188 concepts in up to 22 languages covering topics related to food, nutrition, agriculture, fisheries, forestry, environment and other related domains.

A global community of editors consisting of librarians, terminologists, information managers and software developers, maintain AGROVOC using VocBench, an open-source multilingual, web-based vocabulary editor and workflow management tool that allows simultaneous, distributed editing. AGROVOC is expressed in Simple Knowledge Organization System (SKOS) and published as Linked Data.

Need some seeds for your topic map in “…food, nutrition, agriculture, fisheries, forestry, environment and other related domains”?

New draft standard XKOS developed at Dagstuhl workshop

Saturday, November 17th, 2012

New draft standard XKOS developed at Dagstuhl workshop

From the post:

The AIMS team as part of its work in promoting good practices in information management participated in the development of the new draft standard XKOS at the Dagstuhl workshop “Semantic Statistics for Social, Behavioural, and Economic Sciences: Leveraging the DDI Model for the Linked Data Web” in Wadern, Germany, October 15-19, 2012. XKOS is an extension to the popular Simple Knowledge Organization System (SKOS), a W3C Recommendation, to meet the needs of classification schemes.

Improving visibility & discoverability of statistical data

XKOS is designed to facilitate the interoperability of micro and macro data both within and without the statistics domain and to be complementary to existing standards such as SDMX, DDI and RDF Data Cube. This proposed extension to SKOS may well become the basis for improving the visibility and discoverability of statistical data on the semantic web as well as a mechanism to maintain and disseminate classification schemes according to a standard, cross-domain, machine-readable format.

Acronym Safety Zone:

SDMX – Statistical Data and Metadata eXchange

Data Documentation Initiative

RDF Data Cube

Apologies but I was unable to find a draft of XKOS for a link. Do be aware that is also the acronym for the Korean stock exchange. 😉


Wednesday, November 7th, 2012


From the webpage:

VocBench is a web-based, multilingual, vocabulary editing and workflow tool developed by FAO. It transforms thesauri, authority lists and glossaries into SKOS/RDF concept schemes for use in a linked data environment. VocBench provides tools and functionalities that facilitate the collaborative editing of multilingual terminology and semantic concept information. It further includes administration and group management features as well as built in workflows for maintenance, validation and quality assurance of the data pool.

Current release is (1.3) but 2.0 is due out “Autumn 2012” as open source GPL license.

Another tool that will be of interest to topic map authors.

Linked Legal Data: A SKOS Vocabulary for the Code of Federal Regulations

Sunday, August 26th, 2012

Linked Legal Data: A SKOS Vocabulary for the Code of Federal Regulations by Núria Casellas.


This paper describes the application of Semantic Web and Linked Data techniques and principles to regulatory information for the development of a SKOS vocabulary for the Code of Federal Regulations (in particular of Title 21, Food and Drugs). The Code of Federal Regulations is the codification of the general and permanent enacted rules generated by executive departments and agencies of the Federal Government of the United States, a regulatory corpus of large size, varied subject-matter and structural complexity. The CFR SKOS vocabulary is developed using a bottom-up approach for the extraction of terminology from text based on a combination of syntactic analysis and lexico-syntactic pattern matching. Although the preliminary results are promising, several issues (a method for hierarchy cycle control, expert evaluation and control support, named entity reduction, and adjective and prepositional modifier trimming) require improvement and revision before it can be implemented for search and retrieval enhacement of regulatory materials published by the Legal Information Institute. The vocabulary is part of a larger Linked Legal Data project, that aims at using Semantic Web technologies for the representation and management of legal data.

Considers use of nonregulatory vocabularies, conversion of existing indexing materials and finally settles on NLP processing of the text.

Granting that Title 21, Food and Drugs is no walk in the part, take a peek at the regulations for Title 26, Internal Revenue Code. 😉

A difficulty that I didn’t see mentioned is the changing semantics in statutory law and regulations.

The definition of “person,” for example, varies widely depending upon where it appears. Both chronologically and synchronically.

Moreover, if I have a nonregulatory vocabulary and/or CFR indexes, why shouldn’t that map to the CFR SKOS vocabulary?

I may not have the “correct” index but the one I prefer to use. Shouldn’t that be enabled?

I first saw this at Legal Informatics.

Linked Media Framework [Semantic Web vs. ROI]

Tuesday, July 10th, 2012

Linked Media Framework

From the webpage:

The Linked Media Framework is an easy-to-setup server application that bundles central Semantic Web technologies to offer advanced services. The Linked Media Framework consists of LMF Core and LMF Modules.

LMF Usage Scenarios

The LMF has been designed with a number of typical use cases in mind. We currently support the following tasks out of the box:

Target groups are a in particular casual users who are not experts in Semantic Web technologies but still want to publish or work with Linked Data, e.g. in the Open Government Data and Linked Enterprise Data area.

It is a bad assumption that workers in business or government have free time to add semantics to their data sets.

If adding semantics to your data, by linked data or other means is a core value, resource the task just like any other with your internal staff or hire outside help.

A Semantic Web short coming is the attitude that users are interested in or have the time to build it. Assuming the project to be worthwhile and/or doable.

Users are fully occupied with tasks of their own and don’t need a technical elite tossing more tasks onto them. You want the Semantic Web? Suggest you get on that right away.

Integrated data that meets a business need and has proven ROI isn’t the same thing as the Semantic Web. Give me a call if you are interested in the former, not the latter. (I would do the latter as well, but only on your dime.)

I first saw this at, announcing version 2.2.0 of lmf – Linked Media Framework.

LAC Releases Government of Canada Core
Subject Thesaurus

Thursday, July 7th, 2011

LAC Releases Government of Canada Core Subject Thesaurus

From the post:

The government of Canada has released a new downloadable version of its Core Subject Thesaurus in SKOS/RDF format. According to Library and Archives Canada, “The Government of Canada Core Subject Thesaurus is a bilingual thesaurus consisting of terminology that represents all the fields covered in the information resources of the Government of Canada. Library and Archives Canada is exploring the potential for linked data and the semantic web with LAC vocabularies, metadata and open content.”

When you reach the post with links to the vocabulary you will find it is also available as XML and CVS.

There are changes from the 2009 version.

Here’s an example:

old form new form French equivalent
Adaptive aids
(for persons
with disabilities)
Assistive Technologies Technologie d’aide

Did you notice that the old form and new form don’t share a single word in common?

Imagine that, an unstable core subject thesaurus.

Over time, more terms will be added, changed and deleted. Is there a topic map in the house?

iQvoc 3.0 released

Monday, May 9th, 2011

iQvoc 3.0 released

An SKOS tool that is described on its “about” page as:

iQvoc is a web-based open source tool for managing vocabularies (classifications, thesauri, etc.). It combines an intuitive user interface with Semantic Web standards.

The navigation is intuitive, providing direct links and hierarchical tree visualizations. All common browsers are supported. Due to iQvoc’s modular architecture, its appearance can be easily and extensively customized.

iQvoc covers a comprehensive range of capabilities:

  • support for multiple languages in both the user interface and the content corpus (i.e. labels, notes etc.)
  • import/export of existing SKOS vocabularies
  • editorial control and workflow
  • notes and annotations
  • use of the vocabulary within the Linked Data network
  • modularity and extensibility


Tuesday, May 3rd, 2011


From the website:

PoolParty is a thesaurus management system and a SKOS editor for the Semantic Web including text mining and linked data capabilities. The system helps to build and maintain multilingual thesauri providing an easy-to-use interface. PoolParty server provides semantic services to integrate semantic search or recommender systems into enterprise systems like CMS, web shops, CRM or Wikis.

I encountered PoolParty in the video Pool Party – Semantic Search.

The video elides over a lot of difficulties but what effective advertising doesn’t?

Curious if anyone is familiar with this group/product?

Update: 31 May 2011

Slides: Pool Party – Semantic Search

Nice slide deck on semantic search issues.


Sunday, October 31st, 2010



SKOS is a Semantic Web framework for representing thesauri, classification schemes, subject heading systems, controlled vocabularies, and taxonomies. It enables novel ways of representing terminological knowledge and its linkage with domain knowledge in unambiguous, reusable, and encapsulated fashion within computer applications. According to the National Library of Medicine, the UMLS Knowledge Source (UMLS-KS) integrates and distributes key terminology, classification and coding standards, and associated resources to promote creation of more effective and interoperable biomedical information systems and services “that behave as if they ‘understand’ the meaning of the language of biomedicine and health”. However the current information representation model utilized by UMLS-KS itself is not conducive to computer programs effectively retrieving and automatically and unambiguously interpreting the ‘meaning’ of the biomedical terms and concepts and their relationships.

In this presentation we propose using Simple Knowledge Organization System (SKOS) as an alternative to represent the body of knowledge incorporated within the UMLS-KS within the framework of the Semantic Web technologies. We also introduce our conceptualization of a transformation algorithm to produce an SKOS representation of the UMLS-KS that integrates UMLS-Semantic Network, the UMLS-Metathesaurus complete with all its source vocabularies as a unified body of knowledge along with appropriate information to trace or segregate information based on provenance and governance information. Our proposal and method is based on the idea that formal and explicit representation of any body of knowledge enables its unambiguous, and precise interpretation by automated computer programs. The consequences of such undertaking would be at least three fold: 1) ability to automatically check inconsistencies and errors within a large and complex body of knowledge, 2) automated information interpretation, integration, and discovery, and 3) better information sharing, repurposing and reusing (adoption), and extending the knowledgebase within a distributed and collaborative community of researchers. We submit that UMLS-KS is no exception to this and may benefit from all those advantages if represented fully using a formal representation language. Using SKOS in combination with the transformation algorithm introduced in this presentation are our first steps in that direction. We explain our conceptualization of the algorithms, problems we encountered and how we addressed them with a brief gap analysis to outline the road ahead of us. At the end we also present several use cases from our laboratories at the School of Health information Sciences utilizing this artifact.

WebEx Recording Presentation


The slides are good but you will need to watch the presentation to give them context.

My only caution concerns:

Our proposal and method is based on the idea that formal and explicit representation of any body of knowledge enables its unambiguous, and precise interpretation by automated computer programs.

I don’t doubt that our computers can return “unambiguous, and precise interpretation[s]” but that isn’t the same thing as “correct” interpretations.