Is Linked Data the future of data integration in the enterprise? by John Walker.
From the post:
Following the basic Linked Data principles we have assigned HTTP URIs as names for things (resources) providing an unambiguous identifier. Next up we have converted data from a variety of sources (XML, CSV, RDBMS) into RDF.
One of the key features of RDF is the ability to easily merge data about a single resource from multiple source into a single “supergraph” providing a more complete description of the resource. By loading the RDF into a graph database, it is possible to make an endpoint available which can be queried using the SPARQL query language. We are currently using Dydra as their cloud-based database-as-a-service model provides an easy entry route to using RDF without requiring a steep learning curve (basically load your RDF and you’re away), but there are plenty of other options like Apache Jena and OpenRDF Sesame. This has made it very easy for us to answer to complex questions requiring data from multiple sources, moreover we can stand up APIs providing access to this data in minutes.
By using a Linked Data Plaform such as Graphity we can make our identifiers (HTTP URIs) dereferencable. In layman’s terms when someone plugs the URI into a browser, we provide a description of the resource in HTML. Using content negotiation we are able to provide this data in one of the standard machine-readable XML, JSON or Turtle formats. Graphity uses Java and XSLT 2.0 which our developers already have loads of experience with and provides powerful mechanisms with which we will be able to develop some great web apps.
What do you make of:
One of the key features of RDF is the ability to easily merge data about a single resource from multiple source into a single “supergraph” providing a more complete description of the resource.
???
I suppose if by some accident we all use the same URI as an identifier, that would be the case. But that hardly requires URIs, Linked Data or RDF.
Scientific conferences on digital retrieval the 1950’s worried about diversity of nomenclature being barriers to discovery of resources. If we haven’t addressed the semantic diversity issue in sixty (60) years of talking about it, it isn’t clear how creating another set of diverse names is going to help.
There may be other reasons for using URIs but seamless merging doesn’t appear to be one of them.
Moreover, how do I know what you have identified with a URI?
You can return one or more properties for a URI, but which ones matter for the identity of the subject it identifies?
I first saw this at Linked Data: The Future of Data Integration by Angela Guess.