Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

January 24, 2012

LDIF – Linked Data Integration Framework (0.4)

Filed under: Hadoop,Heterogeneous Data,LDIF,Linked Data — Patrick Durusau @ 3:43 pm

LDIF – Linked Data Integration Framework (0.4)

Version 0.4 News:

Up till now, LDIF stored data purely in-memory which restricted the amount of data that could be processed. Version 0.4 provides two alternative implementations of the LDIF runtime environment which allow LDIF to scale to large data sets: 1. The new triple store backed implementation scales to larger data sets on a single machine with lower memory consumption at the expense of processing time. 2. The new Hadoop-based implementation provides for processing very large data sets on a Hadoop cluster, for instance within Amazon EC2. A comparison of the performance of all three implementations of the runtime environment is found on the LDIF benchmark page.

From the “About LDIF:”

The Web of Linked Data grows rapidly and contains data from a wide range of different domains, including life science data, geographic data, government data, library and media data, as well as cross-domain data sets such as DBpedia or Freebase. Linked Data applications that want to consume data from this global data space face the challenges that:

  1. data sources use a wide range of different RDF vocabularies to represent data about the same type of entity.
  2. the same real-world entity, for instance a person or a place, is identified with different URIs within different data sources.

This usage of different vocabularies as well as the usage of URI aliases makes it very cumbersome for an application developer to write SPARQL queries against Web data which originates from multiple sources. In order to ease using Web data in the application context, it is thus advisable to translate data to a single target vocabulary (vocabulary mapping) and to replace URI aliases with a single target URI on the client side (identity resolution), before starting to ask SPARQL queries against the data.

Up-till-now, there have not been any integrated tools that help application developers with these tasks. With LDIF, we try to fill this gap and provide an an open-source Linked Data Integration Framework that can be used by Linked Data applications to translate Web data and normalize URI while keeping track of data provenance.

With the addition of Hadoop based processing, definitely worth your time to download and see what you think of it.

Ironic that the problem it solves:

  1. data sources use a wide range of different RDF vocabularies to represent data about the same type of entity.
  2. the same real-world entity, for instance a person or a place, is identified with different URIs within different data sources.

already existed, prior to Linked Data as:

  1. data sources use a wide range of different vocabularies to represent data about the same type of entity.
  2. the same real-world entity, for instance a person or a place, is identified differently within different data sources.

So the Linked Data drill is to convert data, which already has these problems, into Linked Data, which will still have these problems, and then solve the problem of differing identifications.

Yes?

Did I miss a step?

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress