Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

February 7, 2013

The Semantic Web Is Failing — But Why? (Part 5)

Filed under: Identity,OWL,RDF,Semantic Web — Patrick Durusau @ 4:30 pm

Impoverished Identification by URI

There is one final part of the faliure of the Semantic Web puzzle to explore before we can talk about a solution.

In owl:sameAs and Linked Data: An Empircal Study, Ding, Shinavier, Finin and McGuinness write:

Our experimental results have led us to identify several issues involving the owl:sameAs property as it is used in practice in a linked data context. These include how best to manage owl:sameAs assertions from “third parties”, problems in merging assertions from sources with different contexts, and the need to explore an operational semantics distinct from the strict logical meaning provided by OWL.

To resolve varying usages of owl:sameAs, the authors go beyond identifications provided by a URI to look to other properties. For example:

Many owl:sameAs statements are asserted due to the equivalence of the primary feature of resource description, e.g. the URIs of FOAF profiles of a person may be linked just because they refer to the same person even if the URIs refer the person at different ages. The odd mashup on job-title in previous section is a good example for why the URIs in different FOAF profiles are not fully equivalent. Therefore, the empirical usage of owl:sameAs only captures the equivalence semantics on the projection of the URI on social entity dimension (removing the time and space dimensions). In thisway, owl:sameAs is used to indicate p artial equivalence between two different URIs, which should not be considered as full equivalence.

Knowing the dimensions covered by a URI and the dimensions covered by a property, it is possible to conduct better data integration using owl:sameAs. For example, since we know a URI of a person provides a temporal-spatial identity, descriptions using time-sensitive properties, e.g. age, height and workplace, should not be aggregated, while time-insensitive properties, such as eye color and social security number, may be aggregated in most cases.

When an identification is insufficient based on a single URI, additional properties can be considered.

My question then is why do ordinary users have to wait for experts to decide their identifications are insufficient? Why can’t we empower users to declare multiple properties, including URIs, as a means of identification?

It could be something as simple as JSON key/value pairs with a notation of “+” for must match, “-” for must not match, and “?” for optional to match.

A declaration of identity by users about the subjects in their documents. Who better to ask?

Not to mention that the more information supplies with for an identification, the more likely they are to communicate, successfully, with other users.

URIs may be Tim Berners-Lee’s nails, but they are insufficient to support the scaffolding required for robust communication.


The next series starts with Saving the “Semantic” Web (Part 1)

The Semantic Web Is Failing — But Why? (Part 4)

Filed under: Interface Research/Design,RDF,Semantic Web — Patrick Durusau @ 4:30 pm

Who Authors The Semantic Web?

With the explosion of data, “big data” to use the oft-abused terminology, authoring semantics cannot be solely the province of a smallish band of experts.

Ordinary users must be enabled to author semantics on subjects of importance to them, without expert supervision.

The Semantic Web is designed for the semantic equivalent of:

F16 Cockpit

An F16 cockpit has an interface some people can use, but hardly the average user.

VW Dashboard

The VW “Bettle” has an interface used by a large number of average users.

Using a VW interface, users still have accidents, disobey rules of the road, lock their keys inside and make other mistakes. But the number of users who can use the VW interface is several orders of magnitude greater than the F-16/RDF interface.

Designing a solution that only experts can use, if participation by average users is a goal, is a path to failure.


The next series starts with Saving the “Semantic” Web (Part 1)

The Semantic Web Is Failing — But Why? (Part 3)

Filed under: Linked Data,RDF,Semantic Web — Patrick Durusau @ 4:30 pm

Is Linked Data the Answer?

Leaving the failure of users to understand RDF semantics to one side, there is also the issue of the complexity of its various representations.

Consider Kingsley Idehem’s “simple” example Turtle document, which he posted in: Simple Linked Data Deployment via Turtle Docs using various Storage Services:

##### Starts Here #####
# Note: the hash is a comment character in Turtle
# Content start
# You can save this to a local file. In my case I use Local File Name: kingsley.ttl .
# Actual Content:

# prefix decalaration that enable the use of compact identifiers instead of fully expanded 
# HTTP URIs.

@prefix owl:   .
@prefix foaf:  .
@prefix rdfs:  . 
@prefix wdrs:  .
@prefix opl:  .
@prefix cert:  .
@prefix:<#>.

# Profile Doc Stuff

<> a foaf:Document . 
<> rdfs:label "DIY Linked Data Doc About: kidehen" .
<> rdfs:comment "Simple Turtle File That Describes Entity: kidehen " .

# Entity Me Stuff

<> foaf:primaryTopic :this .
<> foaf:maker :this .
:this a foaf:Person . 
:this wdrs:describedby <> . 
:this foaf:name "Kingsley Uyi Idehen" .
:this foaf:firstName "Kingsley" .
:this foaf:familyName "Idehen" .
:this foaf:nick "kidehen" .
:this owl:sameAs  .
:this owl:sameAs  .
:this owl:sameAs  .
:this owl:sameAs  .
:this foaf:page  .
:this foaf:page  .
:this foaf:page  .
:this foaf:page  . 
:this foaf:knows , , , , ,  .

# Entity Me: Identity & WebID Stuff 

#:this cert:key :pubKey .
#:pubKey a cert:RSAPublicKey;
# Public Key Exponent
# :pubkey cert:exponent "65537" ^^ xsd:integer;
# Public Key Modulus
# :pubkey cert:modulus "d5d64dfe93ab7a95b29b1ebe21f3cd8a6651816c9c39b87ec51bf393e4177e6fc
2ee712d92caf9d9f1423f5e65f127274529a2e6cc53f1e452c6736e8db8732f919c4160eaa9b6f327c8617c
40036301b547abfc4c5de610780461b269e3d8f8e427237da6152ac2047d88ff837cddae793d15427fa7ce
067467834663737332be467eb353be678bffa7141e78ce3052597eae3523c6a2c414c2ae9f8d7be807bb3
fc0d516b8ecd2fafee4f20ff3550919601a0ad5d29126fb687c2e8c156f04918a92c4fc09f136473f3303814e1
83185edf0046e124e856ca7ada027345e614f8d665f5d7172d880497005ff4626c2b0f2206f7dce717e4f279
dd2a0ddf04b" ^^ xsd:hexBinary .

# :this opl:hasCertificate :cert .
# :cert opl:fingerprint "640F9DD4CFB6DD6361CBAD12C408601E2479CC4A" ^^ xsd:hexBinary;
#:cert opl:hasPublicKey "d5d64dfe93ab7a95b29b1ebe21f3cd8a6651816c9c39b87ec51bf393e4177e6fc2
ee712d92caf9d9f1423f5e65f127274529a2e6cc53f1e452c6736e8db8732f919c4160eaa9b6f327c8617c400
36301b547abfc4c5de610780461b269e3d8f8e427237da6152ac2047d88ff837cddae793d15427fa7ce06746
7834663737332be467eb353be678bffa7141e78ce3052597eae3523c6a2c414c2ae9f8d7be807bb3fc0d516b
8ecd2fafee4f20ff3550919601a0ad5d29126fb687c2e8c156f04918a92c4fc09f136473f3303814e183185edf00
46e124e856ca7ada027345e614f8d665f5d7172d880497005ff4626c2b0f2206f7dce717e4f279dd2a0ddf04b" 
^^ xsd:hexBinary .

### Ends or Here###

Try handing that “simple” example and Idehem’s article to some non-technical person in your office to gauge its “simplicity.”

For that matter, hand it to some of your technical but non-Semantic Web folks as well.

Your experience with that exercise will speak louder than anything I can say.


The next series starts with Saving the “Semantic” Web (Part 1)

The Semantic Web Is Failing — But Why? (Part 2)

Filed under: RDF,Semantic Web — Patrick Durusau @ 4:30 pm

Should You Be Using RDF?

Pay Hayes (editor of RDF Semantics) and Richard Cyganiak (a linked data expert), had this interchange on the RDF Working Group discussion list:

Cyganiak: The text stresses that the presence of an ill-typed literals does not constitute an inconsistency.

Cyganiak: But why does the distinction matter?

Hayes: I am not sure what you mean by “the distinction” here. Why would you expect that an ill-typed literal would produce an inconsistency? Why would the presence of an ill-typed literal make a triple false?

Cyganiak: Is there any reason anybody needs to know about this distinction who isn’t interested in the arcana of the model theory?

Hayes: I’m not sure what you consider to be “arcana”. Someone who cannot follow the model theory probably shouldn’t be using RDF. (emphasis added) Re: Ill-typed vs. inconsistent? (Mon, 12 Nov 2012 01:58:51 -0600)

When challenged on the need to follow model theory, Hayes retreats, but only slightly:

Well, it was rather late and I had just finished driving 2400 miles so maybe I was a bit abrupt. But I do think that anyone who does not understand what “inconsistent” means should not be using RDF, or at any rate should only be using it under the supervision of someone who *does* know the basics of semantic notions. Its not like nails versus metallurgy so much as nails versus hammers. If you are trying to push the nails in by hand, you probably need to hire a framer. (emphasis added) Re: Ill-typed vs. inconsistent? (Mon, 12 Nov 2012 09:58:52 -0600)

A portion of the Introduction to RDF Semantics reads:

RDF is an assertional language intended to be used to express propositions using precise formal vocabularies, particularly those specified using RDFS [RDF-VOCABULARY], for access and use over the World Wide Web, and is intended to provide a basic foundation for more advanced assertional languages with a similar purpose. The overall design goals emphasise generality and precision in expressing propositions about any topic, rather than conformity to any particular processing model: see the RDF Concepts document [RDF-CONCEPTS] for more discussion.

Exactly what is considered to be the ‘meaning’ of an assertion in RDF or RDFS in some broad sense may depend on many factors, including social conventions, comments in natural language or links to other content-bearing documents. Much of this meaning will be inaccessible to machine processing and is mentioned here only to emphasize that the formal semantics described in this document is not intended to provide a full analysis of ‘meaning’ in this broad sense; that would be a large research topic. The semantics given here restricts itself to a formal notion of meaning which could be characterized as the part that is common to all other accounts of meaning, and can be captured in mechanical inference rules.

This document uses a basic technique called model theory for specifying the semantics of a formal language. Readers unfamiliar with model theory may find the glossary in appendix B helpful; throughout the text, uses of terms in a technical sense are linked to their glossary definitions. Model theory assumes that the language refers to a ‘world‘, and describes the minimal conditions that a world must satisfy in order to assign an appropriate meaning for every expression in the language. A particular world is called an interpretation, so that model theory might be better called ‘interpretation theory’. The idea is to provide an abstract, mathematical account of the properties that any such interpretation must have, making as few assumptions as possible about its actual nature or intrinsic structure, thereby retaining as much generality as possible. The chief utility of a formal semantic theory is not to provide any deep analysis of the nature of the things being described by the language or to suggest any particular processing model, but rather to provide a technical way to determine when inference processes are valid, i.e. when they preserve truth. This provides the maximal freedom for implementations while preserving a globally coherent notion of meaning.

Model theory tries to be metaphysically and ontologically neutral. It is typically couched in the language of set theory simply because that is the normal language of mathematics – for example, this semantics assumes that names denote things in a set IR called the ‘universe‘ – but the use of set-theoretic language here is not supposed to imply that the things in the universe are set-theoretic in nature. Model theory is usually most relevant to implementation via the notion of entailment, described later, which makes it possible to define valid inference rules.

Readers should read RDF Semantics to answer for themselves whether they understand “inconsistent” as defined therein. Noting that Richard Cyganiak, a linked data expert, did not.


The next series starts with Saving the “Semantic” Web (Part 1)

The Semantic Web Is Failing — But Why? (Part 1)

Filed under: Identity,OWL,RDF,Semantic Web — Patrick Durusau @ 4:29 pm

Introduction

Before proposing yet another method for identification and annotation of entities in digital media, it is important to draw lessons from existing systems. Failing systems in particular, so their mistakes are not repeated or compounded. The Semantic Web is an example of such a system.

Doubters of that claim should the report Additional Statistics and Analysis of the Web Data Commons August 2012 Corpus by Web Data Commons.

Web Data Commons is a structured data research project based at the Research Group Data and Web Science at the University of Mannheim and the Institute AIFB at the Karlsruhe Institute of Technology. Supported by PlanetData and LOD2 research projects, the Web Data Commons is not opposed to the Semantic Web.

But the Additional Statistics and Analysis of the Web Data Commons August 2012 Corpus document reports:

Altogether we discovered structured data within 369 million of the 3 billion pages contained in the Common Crawl corpus (12.3%). The pages containing structured data originate from 2.29 million among the 40.5 million websites (PLDs) contained in the corpus (5.65%). Approximately 519 thousand websites use RDFa, while only 140 thousand websites use Microdata. Microformats are used on 1.7 million websites. It is interesting to see that Microformats are used by approximately 2.5 times as many websites as RDFa and Microdata together. (emphasis added)

To sharpen the point, RDFa is 1.28% of the 40.5 million websites, eight (8) years after its introduction (2004) and four (4) years after reaching Recommendation status (2008).

Or more generally:

Parsed HTML URLs 3,005,629,093
URLs with Triples 369,254,196

On in a layperson’s terms, for this web corpus, parsed HTML URLs outnumber URLs with Triples between approximately eight to one.

Being mindful that the corpus is only web accessible data and excludes “dark data,” the need for a more robust solution that the Semantic Web is self-evident.

The failure of the Semantic Web is no assurance that any alternative proposal will fare better. Understanding why the Semantic Web is failing is a prerequisite to any successful alternative.


Before you “flame on,” you might want to read the entire series. I end up with a suggestion based on work by Ding, Shinavier, Finin and McGuinness.


The next series starts with Saving the “Semantic” Web (Part 1)

February 4, 2013

Introduction to: Triplestores [Perils of Inferencing]

Filed under: RDF,Triplestore — Patrick Durusau @ 3:19 pm

Introduction to: Triplestores Juan Sequeda.

From the post:

Triplestores are Database Management Systems (DBMS) for data modeled using RDF. Unlike Relational Database Management Systems (RDBMS), which store data in relations (or tables) and are queried using SQL, triplestores store RDF triples and are queried using SPARQL.

A key feature of many triplestores is the ability to do inference. It is important to note that a DBMS typically offers the capacity to deal with concurrency, security, logging, recovery, and updates, in addition to loading and storing data. Not all Triplestores offer all these capabilities (yet).

Unless you have been under a rock or in another dimension, triplestores are not news.

This is a short list of some of the more popular ones and illustrates one of the problems with “inferencing,” inside or outside of a triple store.

The inference in this article says that “full professors,” “assistant professors,” and “teachers” are all “professors.”

Suggest you drop by the local university to see if “full professors” think of instructors or “adjunct professors” as “professors.”

BTW, the “inferencing” is “correct” as far as the OWL ontology in the article goes. But that’s part of the problem.

Being “correct” in OWL may or may not have any relationship to the world as you experience it.


My wife reminded me at lunch that piano players in whore houses around the turn of the 19th century were also called “professor.”

Another inference not accounted for.

January 21, 2013

RDF 1.1 Concepts and Abstract Syntax [New Draft]

Filed under: RDF,Semantic Web — Patrick Durusau @ 7:23 pm

RDF 1.1 Concepts and Abstract Syntax

From the introduction:

The Resource Description Framework (RDF) is a framework for representing information in the Web.

RDF 1.1 Concepts and Abstract Syntax defines an abstract syntax (a data model) which serves to link all RDF-based languages and specifications. The abstract syntax has two key data structures: RDF graphs are sets of subject-predicate-object triples, where the elements may be IRIs, blank nodes, or datatyped literals. They are used to express descriptions of resources. RDF datasets are used to organize collections of RDF graphs, and comprise a default graph and zero or more named graphs. This document also introduces key concepts and terminology, and discusses datatyping and the handling of fragment identifiers in IRIs within RDF graphs.

Numerous issues await your comments and suggestions.

January 15, 2013

Importing RDF into Faunus

Filed under: Faunus,RDF — Patrick Durusau @ 8:32 pm

RDF Format

Description of RDFInputFormat for Faunus to convert the edge list format of RDF into the adjacency list used by Faunus.

Currently supports:

  • rdf-xml
  • n-triples
  • turtle
  • n3
  • trix
  • trig

The converter won’t help with the lack of specified identification properties.

But, format conversion can’t increase the amount of information stored in a format.

At best it can be lossless.

Chinese Rock Music

Filed under: Music,OWL,RDF,Semantic Web — Patrick Durusau @ 8:30 pm

Experiences on semantifying a Mediawiki for the biggest recource about Chinese rock music: rockinchina .com by René Pickhardt.

From the post:

During my trip in China I was visiting Beijing on two weekends and Maceau on another weekend. These trips have been mainly motivated to meet old friends. Especially the heads behind the biggest English resource of Chinese Rock music Rock in China who are Max-Leonhard von Schaper and the founder of the biggest Chinese Rock Print Magazin Yang Yu. After looking at their wiki which is pure gold in terms of content but consists mainly of plain text I introduced them the idea of putting semantics inside the project. While consulting them a little bit and pointing them to the right resources Max did basically the entire work (by taking a one month holiday from his job. Boy this is passion!).

I am very happy to anounce that the data of rock in china is published as linked open data and the process of semantifying the website is in great shape. In the following you can read about Max experiences doing the work. This is particularly interesting because Max has no scientific background in semantic technologies. So we can learn a lot on how to improve these technologies to be ready to be used by everybody:

Good to see that René hasn’t lost his touch for long blog titles. 😉

A very valuable lesson in the difficulties posed by current “semantic” technologies.

Max and company succeed, but only after heroic efforts.

December 21, 2012

SPARQL end-point of data.euorpeana.edu

Filed under: Library,Museums,RDF,SPARQL — Patrick Durusau @ 3:22 pm

SPARQL end-point of data.euorpeana.edu

From the webpage:

Welcome on the SPARQL end-point of data.europeana.eu!

data.europeana.eu currently contains open metadata on 20 million texts, images, videos and sounds gathered by Europeana. Data is following the terms of the Creative Commons CC0 public domain dedication. Data is described the Resource Description Framework (RDF) format, and structured using the Europeana Data Model (EDM). We give more detail on the EDM data we publish on the technical details page.

Please take the time to check out the list of collections currently included in the pilot.

The terms of use and external data sources appearing at data.europeana.eu are provided on the Europeana Data sources page.

Sample queries are available on the sparql page.

At first I wondered why this was news because: Europeana opens up data on 20 million cultural items appeared on 12 September 2012 in the Guardian

I assume the data has been in use since its release last September.

If you have been using it, can you comment on how your use will change now that the data is available as a SPARQL end-point?

December 20, 2012

Best Buy Product Catalog via Semantic Endpoints

Filed under: Linked Data,RDF — Patrick Durusau @ 2:31 pm

Announcing BBYOpen Metis Alpha: Best Buy Product Catalog via Semantic Endpoints

From the post:

Announcing BBYOpen Metis Alpha: Best Buy Product Catalog via Semantic Endpoints

These days, consumers have a rich variety of products available at their fingertips. A massive product landscape has evolved, but sadly products in this enormous and rich landscape often get flattened to just a price tag. Over time, it seems the product value proposition, variety, descriptions, specifics, and details that make up products have all but disappeared. This presents consumers with a "paradox of choice" where misinformed decisions can lead to poor product selections, and ultimately product returns and customer remorse.

To solve this problem, BBY Open is excited to announce the first phase Alpha release of Metis, our semantically-driven product insight engine. As part of a phased release approach, this first release consists of publishing all 500K+ of our active Best Buy products with reviews as RDF-enabled endpoints for public consumption.

This alpha release is the first phase in solving this product ambiguity. With the publishing of structured product data in RDF format using industry accepted product ontologies like GoodRelations, standards from the Semantic Web group at the W3C, and the NetKernel platform, the Metis Alpha gives developers the ability to consume and query structured data via SPARQL (get up to speed with Learning SPARQL by Bob DuCharme), enabling the discovery of insight hidden deep inside the product catalog.

Comments?

December 5, 2012

YASGUI: Web-based SPARQL client with bells ‘n wistles

Filed under: RDF,SPARQL — Patrick Durusau @ 4:13 pm

YASGUI: Web-based SPARQL client with bells ‘n wistles

From the post:

A few months ago Laurens Rietveld was looking for a query interface from which he could easily query any other SPARQL endpoint.

But he couldn’t find any that fit my requirements:

So he decided to make his own!

Give it a try at: http://aers.data2semantics.org/sparql/

Future work (next year probably):

In case you are interested in SPARQL per se or want to extract information for re-use in a topic map. Could be interesting.

Good to see mention of our friends at Mondeca.

Normalizing company names with SPARQL and DBpedia

Filed under: DBpedia,RDF,SPARQL — Patrick Durusau @ 12:01 pm

Normalizing company names with SPARQL and DBpedia

Bob DuCharme writes:

Wikipedia page redirection data, waiting for you to query it.

If you send your browser to http://en.wikipedia.org/wiki/Big_Blue, you’ll end up at IBM’s page, because Wikipedia knows that this nickname usually refers to this company. (Apparently, it’s also a nickname for several high schools and universities.) This data pointing from nicknames to official names is also stored in DBpedia, which means that we we can use SPARQL queries to normalize company names. You can use the same technique to normalize other kinds of names—for example, trying to send your browser to http://en.wikipedia.org/wiki/Bobby_Kennedy will actually send it to http://en.wikipedia.org/wiki/Robert_F._Kennedy—but a query that sticks to one domain will have a simpler job. Description Logics and all that.

As always Bob is on the cutting edge of the use of a markup standard!

Possible topic map analogies:

  • create a second name cluster and the “normalized name” is an additional base name
  • move the “nickname” to a variant name (scope?) and update the base name to be the normalized name (with changes to sort/display as necessary)

I am assuming that Bob’s lang(?redirectsTo) = "en" operates like scope in topic maps.

Except that scope in topic map is represented by one or more topics, which means merging can occur between topics that represent the same language.

November 27, 2012

Linking your resources to the Data Web

Filed under: AGROVOC,Linked Data,RDF — Patrick Durusau @ 4:56 am

First LOD@AIMS Webinar with Tom Baker on “Linking your resources to the Data Web”

4th December 2012 – 16:00 Rome Time

From the post:

The AIMS Metadata Community of Practice is glad to announce the first Linked Open Data @ AIMS webinar entitled Linking your resources to the Data Web. The session will take place on 4th December 2012 – 16:00 Rome Time – and will be presented by Tom Baker, chief information officer (CIO) of the Dublin Core Metadata Initiative (DCMI).

This event is part of the series of webinars Linked Open Data @ AIMS that will take place from December 2012 to February 2013. A total of 6 specialists will talk about Linked Open Data and the Semantic Web to the agricultural information management community. The webinars will be in the 6 languages used on AIMS – English, French, Spanish, Arabic, Chinese and Russian.

The objective of Linked Open Data @ AIMS webinars is to help individuals and organizations to understand better the initiatives related to the Semantic Web that are currently taking place within the AIMS Communities of Practice.


Linking data into the Semantic Web means more than just making data available on a Web server. It means using Web addresses (URIs) in data as names for things; tagging resources using those URIs – for example, URIs for agricultural topics from AGROVOC; and using URIs to point to related resources.

This talk walks through a simple example to show how linking works in practice, illustrating RDF technology with animated graphics. It concludes with a recipe for linking your data: Decide what bits of your data are most important, such as Subject, Author, and Publisher. Use URIs in your data, whenever possible, such as Subject terms from AGROVOC. Then publish your data in RDF on the Web where others can link to it. Simple solutions can be enough to yield good results.

Tom Baker of the Dublin Core Metadata Initiative will be an excellent speaker but when I saw:

Tom Baker on “Linking your resources to the Data Web”

my first thoughts were of another Tom Baker and wondering how he had gotten involved with Linked Data. 😉

In the body of the announcement, a URL identifies the “Tom Baker” in the text as another “Tom Baker” than the one I was thinking about.

Interesting. It didn’t take Linked Data or RDF to make the distinction, only the <a> element plus an href attribute. Something to think about.

November 22, 2012

SDshare Community Group

Filed under: RDF,SDShare — Patrick Durusau @ 5:28 am

SDshare Community Group

From the webpage:

SDshare is a highly RESTful protocol for synchronization of RDF (and potentially other) data, by publishing feeds of data changes as Atom feeds.

A W3C community group on SDShare.

The current SDShare draft.

Its known issues.

Co-chaired by Lars Marius Garshol and Graham Moore.

November 21, 2012

Sindice SPARQL endpoint

Filed under: RDF,Sindice,SPARQL — Patrick Durusau @ 12:00 pm

Sindice SPARQL endpoint by Gabi Vulcu.

From an email by Gabi:

We have released a new version of the SIndice SPARQL endpoint (http://sparql.sindice.com/) with two new datasets: sudoc and yago

Below are the current dump datasets that are in the Sparql endpoint:

dataset_uri dataset_name
http://sindice.com/dataspace/default/dataset/dbpedia “dbpedia”
http://sindice.com/dataspace/default/dataset/medicare “medicare”
http://sindice.com/dataspace/default/dataset/whoiswho “whoiswho”
http://sindice.com/dataspace/default/dataset/sudoc “sudoc”
http://sindice.com/dataspace/default/dataset/nytimes “nytimes”
http://sindice.com/dataspace/default/dataset/ookaboo “ookaboo”
http://sindice.com/dataspace/default/dataset/europeana “europeana”
http://sindice.com/dataspace/default/dataset/basekb “basekb”
http://sindice.com/dataspace/default/dataset/geonames “geonames”
http://sindice.com/dataspace/default/dataset/wordnet “wordnet”
http://sindice.com/dataspace/default/dataset/dailymed “dailymed”
http://sindice.com/dataspace/default/dataset/reactome “reactome”
http://sindice.com/dataspace/default/dataset/yago “yago”

The list of crawled website datasets that have been rdf-ized and loaded into the Sparql endpoint can be found here [1]

Due to space limitation we limited both the amount of dump datasets to the ones in the above table and the websites datasets to the top 1000 domains based on the DING[3] score.

However, upon request, if someone needs a particular dataset( there are more to choose from here [4]), we can arrange to get it into the Sparql endpoint in the next release.

[1] https://docs.google.com/spreadsheet/ccc?key=0AvdgRy2el8d9dERUZzBPNEZIbVJTTVVIRDVUWHhKdWc
[2] https://docs.google.com/spreadsheet/ccc?key=0AvdgRy2el8d9dGhDaHMta0MtaG9vWWhhbTd5SVVaX1E
[3] http://ding.sindice.com
[4] https://docs.google.com/spreadsheet/ccc?key=0AvdgRy2el8d9dGhDaHMta0MtaG9vWWhhbTd5SVVaX1E#gid=0

You may also be interested in: sindice-dev — Sindice developers list.

November 20, 2012

DBpedia 3.8 Downloads

Filed under: DBpedia,RDF — Patrick Durusau @ 4:34 pm

DBpedia 3.8 Downloads

From the webpage:

This pages provides downloads of the DBpedia datasets. The DBpedia datasets are licensed under the terms of the Creative Commons Attribution-ShareAlike License and the GNU Free Documentation License. The downloads are provided as N-Triples and N-Quads, where the N-Quads version contains additional provenance information for each statement. All files are bzip2 1 packed.

I had to ask to find this one.

One interesting feature that would bear repetition elsewhere is the ability to see a sample of a data file.

For example, at Links to Wikipedia Article, nest to “nt” (N-Triple), there is a “?” that when followed displays in part:

<http://dbpedia.org/resource/AccessibleComputing><http://xmlns.com/foaf/0.1/isPrimaryTopicOf><http://en.wikipedia.org/wiki/AccessibleComputing>.
<http://en.wikipedia.org/wiki/AccessibleComputing><http://xmlns.com/foaf/0.1/primaryTopic><http://dbpedia.org/resource/AccessibleComputing>.
<http://en.wikipedia.org/wiki/AccessibleComputing><http://purl.org/dc/elements/1.1/language>”en”@en .
<http://dbpedia.org/resource/AfghanistanHistory><http://xmlns.com/foaf/0.1/isPrimaryTopicOf><http://en.wikipedia.org/wiki/AfghanistanHistory>.
<http://en.wikipedia.org/wiki/AfghanistanHistory><http://xmlns.com/foaf/0.1/primaryTopic><http://dbpedia.org/resource/AfghanistanHistory>.
<http://en.wikipedia.org/wiki/AfghanistanHistory><http://purl.org/dc/elements/1.1/language>”en”@en .
<http://dbpedia.org/resource/AfghanistanGeography><http://xmlns.com/foaf/0.1/isPrimaryTopicOf><http://en.wikipedia.org/wiki/AfghanistanGeography>.
<http://en.wikipedia.org/wiki/AfghanistanGeography><http://xmlns.com/foaf/0.1/primaryTopic><http://dbpedia.org/resource/AfghanistanGeography>.
<http://en.wikipedia.org/wiki/AfghanistanGeography><http://purl.org/dc/elements/1.1/language>”en”@en .

Which enabled me to conclude for my purposes, the reverse pointing from DBpedia to Wikipedia was repetitious. And since the entire dataset is only for the English version of Wikipedia, the declaration of language was superfluous.

That may not be true for your intended use of DBpedia data.

My point being that seeing sample data allows a quick evaluation before downloading large amounts of data.

A feature I would like to see for other data sets.

November 19, 2012

Update: TabLinker & UnTabLinker

Filed under: CSV,Excel,RDF,TabLinker/UnTabLinker — Patrick Durusau @ 2:48 pm

Update: TabLinker & UnTabLinker

From the post:

TabLinker, introduced in an earlier post, is a spreadsheet to RDF converter. It takes Excel/CSV files as input, and produces enriched RDF graphs with cell contents, properties and annotations using the DataCube and Open Annotation vocabularies.

TabLinker interprets spreadsheets based on hand-made markup using a small set of predefined styles (e.g. it needs to know what the header cells are). Work package 6 is currently investigating whether and how we can perform this step automatically.

Features:

  • Raw, model-agnostic conversion from spreadsheets to RDF
  • Interactive spreadsheet marking within Excel
  • Automatic annotation recognition and export with OA
  • Round-trip conversion: revive the original spreadsheet files from the produced RDF (UnTabLinker)

Even with conversion tools, the question has to be asked:

What was gained by the conversion? Yes, yes the data is now an RDF graph but what can I do now that I could not do before?

With the caveat that it has to be something I want to do.

November 15, 2012

OpenOffice, RDF and Graphs

Filed under: Graphs,OpenOffice,RDF — Patrick Durusau @ 5:41 pm

Creating a virtual RDF graph describing a set of OpenOffice spreadsheets with Apache Jena and Fuseki

by Pierre Lindenbaum

From the post:

In the current post, I will use the Jena API for RDF to implement a virtual RDF graph describing the content of a set of openoffice/libreoffice spreasheets.

Now this is truly an intersection of interests for me!

BTW, ODF 1.2 has support for RDF based metadata. There hasn’t been a lot of implementation activity on it but that could change.

Particularly since the right application could allow users to do an “ontology/entity” check, just as you do a spell check now.

If you want to know what something means, why not ask the author?

I will be forwarding this link to the ODF crowd as we work towards ODF 1.3.

Highly recommended.

Stardog 1.1 Release

Filed under: RDF,SPARQL,Stardog — Patrick Durusau @ 4:53 pm

Stardog 1.1 Release

From the webpage:

Stardog is a fast, commercial RDF database: SPARQL for queries; OWL for reasoning; pure Java for the Enterprise.

Stardog 1.1 supports SPARQL 1.1.

I first saw this in a tweet from Kendall Clark.

November 12, 2012

An Ontological Representation of Biomedical Data Sources and Records [Data & Record as Subjects]

Filed under: Bioinformatics,Biomedical,Medical Informatics,Ontology,RDF — Patrick Durusau @ 7:27 pm

An Ontological Representation of Biomedical Data Sources and Records by Michael Bada, Kevin Livingston, and Lawrence Hunter.

Abstract:

Large RDF-triple stores have been the basis of prominent recent attempts to integrate the vast quantities of data in semantically divergent databases. However, these repositories often conflate data-source records, which are information content entities, and the biomedical concepts and assertions denoted by them. We propose an ontological model for the representation of data sources and their records as an extension of the Information Artifact Ontology. Using this model, we have consistently represented the contents of 17 prominent biomedical databases as a 5.6-billion RDF-triple knowledge base, enabling querying and inference over this large store of integrated data.

Recognition of the need to treat data containers as subjects, along with the data they contain, is always refreshing.

In particular because the evolution of data sources can be captured, as the authors remark:

Our ontology is fully capable of handling the evolution of data sources: If the schema of a given data set is changed, a new instance of the schema is simply created, along with the instances of the fields of the new schema. If the data sets of a data source change (or a new set is made available), an instance for each new data set can be created, along with instances for its schema and fields. (Modeling of incremental change rather than creation of new instances may be desirable but poses significant representational challenges.) Additionally, using our model, if a researcher wishes to work with multiple versions of a given data source (e.g., to analyze some aspect of multiple versions of a given database), an instance for each version of the data source can be created. If different versions of a data source consist of different data sets (e.g., different file organizations) and/or different schemas and fields, the explicit representation of all of these elements and their linkages will make the respective structures of the disparate data-source versions unambiguous. Furthermore, it may be the case that only a subset of a data source needs to be represented; in such a case, only instances of the data sets, schemas, and fields of interest are created.

I first saw this in a tweet by Anita de Waard.

November 10, 2012

IOGDS: International Open Government Dataset Search

Filed under: Dataset,Linked Data,RDF,SPARQL — Patrick Durusau @ 9:21 am

IOGDS: International Open Government Dataset Search

Description:

The TWC International Open Government Dataset Search (IOGDS) is a linked data application based on metadata “scraped” from hundreds of international dataset catalog websites publishing a rich variety of government data. Metadata extracted from these catalog websites is automatically converted to RDF linked data and re-published via the TWC LOGD SPARQL endpoint and made available for download. The TWC IOGDS demo site features an efficient, reconfigurable faceted browser with search capabilities offering a compelling demonstration of the value of a common metadata model for open government dataset catalogs. We believe that the vocabulary choices demonstrated by IOGDS highlights the potential for useful linked data applications to be created from open government catalogs and will encourage the adoption of such a standard worldwide.

In addition to the datasets you will find tutorials, videos, demos, tools and technologies and other resources.

Whether you are looking for Linked Data or Linked Data to re-use in other ways.

Seen in a tweet by Tim O’Reilly.

November 9, 2012

Eleven SPARQL 1.1 Specifications Published

Filed under: RDF,Semantic Web,SPARQL — Patrick Durusau @ 7:00 am

Eleven SPARQL 1.1 Specifications Published

From the post:

The SPARQL Working Group has today published a set of eleven documents, advancing most of SPARQL 1.1 to Proposed Recommendation. Building on the success of SPARQL 1.0, SPARQL 1.1 is a full-featured standard system for working with RDF data, including a query/update language, two HTTP protocols (one full-featured, one using basic HTTP verbs), three result formats, and other features which allow SPARQL endpoints to be combined and work together. Most features of SPARQL 1.1 have already been implemented by a range of SPARQL suppliers, as shown in our table of implementations and test results.

The Proposed Recommendations are:

  1. SPARQL 1.1 Overview – Overview of SPARQL 1.1 and the SPARQL 1.1 documents
  2. SPARQL 1.1 Query Language – A query language for RDF data.
  3. SPARQL 1.1 Update – Specifies additions to the query language to allow clients to update stored data
  4. SPARQL 1.1 Query Results JSON Format – How to use JSON for SPARQL query results
  5. SPARQL 1.1 Query Results CSV and TSV Formats – How to use comma-separated values (CVS) and tab-separated values (TSV) for SPARQL query results
  6. SPARQL Query Results XML Format – How to use XML for SPARQL query results. (This contains only minor, editorial updates from SPARQL 1.0, and is actually a Proposed Edited Recommendation.)
  7. SPARQL 1.1 Federated Query – an extension of the SPARQL 1.1 Query Language for executing queries distributed over different SPARQL endpoints.
  8. SPARQL 1.1 Service Description – a method for discovering and a vocabulary for describing SPARQL services.

While you are waiting for news on SPARQL performance increases, some reading material to pass the time.

November 8, 2012

Federated SPARQL Queries [Take “Hit” From Multiple/Distributed Data Sets]

Filed under: BigData,RDF,SPARQL — Patrick Durusau @ 5:30 pm

On the Impact of Data Distribution in Federated SPARQL Queries by Nur Aini Rakhmawati and Michael Hausenblas.

Abstract:

With the growing number of publicly available SPARQL endpoints, federated queries become more and more attractive and feasible. Compared to queries against a single endpoint, queries that range over a number of endpoints pose new challenges, ranging from the type and number of datasets involved to the data distribution across the datasets. Existingre search focuses on the data distribution in a central store and is mainly concerned with adopting well-known, traditional database techniques. In this work we investigate the impact of the data distribution in the context of federated SPARQL queries.We perform a number of experiments with four federation frameworks (Sesame Alibaba, Splendid, FedX, and Darq) against an RDF dataset, Dailymed, that we partition by graph and class.Our preliminary results confirm the intuition that the more datasets involved in query processing, the worse performance of federation query is and that the data distribution significantly influences the performance.

It isn’t often I read in the same paragraph:

With the growing number of publicly available SPARQL endpoints, federated queries become more and more attractive and feasible.

and

Our preliminary results confirm the intuition that the more datasets involved in query processing, the worse performance of federation query is and that the data distribution significantly influences the performance.

I have trouble reconciling “…more and more attractive and feasible” with “…the more datasets…the worse performance of federation query is….”

Particularly in the age of “big data” where an increasing number of datasets and data distribution are the norms, not exceptions.

I commend the authors for creating data points to confirm “intuitions” about SPARQL performance.

At the same time, their results raise serious questions about SPARQL in big data environments.

November 7, 2012

VocBench

Filed under: Linked Data,RDF,SKOS,VocBench — Patrick Durusau @ 5:52 pm

VocBench

From the webpage:

VocBench is a web-based, multilingual, vocabulary editing and workflow tool developed by FAO. It transforms thesauri, authority lists and glossaries into SKOS/RDF concept schemes for use in a linked data environment. VocBench provides tools and functionalities that facilitate the collaborative editing of multilingual terminology and semantic concept information. It further includes administration and group management features as well as built in workflows for maintenance, validation and quality assurance of the data pool.

Current release is (1.3) but 2.0 is due out “Autumn 2012” as open source GPL license.

Another tool that will be of interest to topic map authors.

October 21, 2012

Relational Data to RDF [Bridge to No Where?]

Filed under: R2ML,RDF,SPARQL — Patrick Durusau @ 4:13 pm

Transforming Relational Data to RDF – R2RML Becomes Official W3C Recommendation by Eric Franzon.

From the post:

Today, the World Wide Web Consortium announced that R2RML has achieved Recommendation status. As stated on the W3C website, R2RML is “a language for expressing customized mappings from relational databases to RDF datasets. Such mappings provide the ability to view existing relational data in the RDF data model, expressed in a structure and target vocabulary of the mapping author’s choice.” In the life cycle of W3C standards creation, today’s announcement means that the specifications have gone through extensive community review and revision and that R2RML is now considered stable enough for wide-spread distribution in commodity software.

Richard Cyganiak, one of the Recommendation’s editors, explained why R2RML is so important. “In the early days of the Semantic Web effort, we’ve tried to convert the whole world to RDF and OWL. This clearly hasn’t worked. Most data lives in entrenched non-RDF systems, and that’s not likely to change.”

“That’s why technologies that map existing data formats to RDF are so important,” he continued. “R2RML builds a bridge between the vast amounts of existing data that lives in SQL databases and the SPARQL world. Having a standard for this makes SPARQL even more useful than it already is, because it can more easily access lots of valuable existing data. It also means that database-to-RDF middleware implementations can be more easily compared, which will create pressure on both open-source and commercial vendors, and will increase the level of play in the entire field.” (emphasis added)

If most data resides in non-RDF systems, what do I gain by converting it into RDF for querying with SPARQL?

Some possible costs:

  • Planning the conversion from non-RDF to RDF system
  • Debugging the conversion (unless it is trivial, the few conversions won’t be right)
  • Developing the SPARQL queries
  • Debugging the SPARQL queries
  • Updating the conversion if new data is added to the source
  • Testing the SPARQL query against updated data
  • Maintenance of the source and target RDF systems (unless pushing SPARQL is a way to urge conversion from relational system)

Or to put it another way, if most data is still on non-RDF data stores, why do I need a bridge to SPARQL world?

Of is this a Sarah Palin bridge to no where?

October 17, 2012

Neo4J, RDF and Kevin Bacon

Filed under: Graphs,Neo4j,RDF — Patrick Durusau @ 9:01 am

Neo4J, RDF and Kevin Bacon by Tom Morris.

From the post:

Today, I managed to wangle my way into Off the Rails, a train hack day. I was helping friends with data mangling: OpenStreetMap, Dbpedia, RDF and Neo4J.

It’s funny actually. Way back when, if I said to people that there is some data that fits quite well into graph models, they’d look at me like some kind of dangerous looney. Graphs? Why? Doesn’t MySQL and JSON do everything I need?

Actually, no.

If you are trying to model a system where there are trains that travel on tracks between stations, that maps quite nicely to graphs, nodes and edges. If only there were databases and data models for that stuff, right?

Oh, yeah, there is. There’s Neo4J and there’s our old friend RDF, and the various triple store databases. I finally had a chance to play with Neo4J today. It’s pretty cool. And it shows us one of the primary issues with the RDF toolchain: it usually fails to implement the one thing any reasonable person wants from a graph store.

Kevin Bacon. Finding shortest path from one node to another with some kind of predicate filter. If you ask people what the one thing they want to do with a graph is, they’ll say: shortest path.

This is what Neo4J makes easy. I can download Neo4J in a Java (or JRuby, Scala, whatever) project, instantiate a database in the form of an embedded database, kinda like SQLite in Rails, parse a load of nodes and relations into it, then in two damn lines of Java find the shortest path between nodes.

The proper starting point for any project is: What questions do you want to ask?

Discovering the answer to that question will point you toward an appropriate technology.

See also: Shortest path problem (and improve the answer while you are there).

September 21, 2012

RDF triple stores — an overview

Filed under: RDF — Patrick Durusau @ 4:26 pm

RDF triple stores — an overview by Lars Marius Garshol.

From the post:

There’s a huge range of triple stores out there, and it’s not trivial to find the one most suited for your exact needs. I reviewed all those I could find earlier this year for a project, and here is the result. I’ve evaluated the stores against the requirements that mattered for that particular project. I haven’t summarized the scores, as everyone’s weights for these requirements will be different.

I’ve deliberately left out rows for whether these tools support things like R2RML, query federation, data binding, SDshare, and so on, even though many of them do. The rationale is that if you pick a triple store that doesn’t support these things you can get support anyway through separate components.

I’ve also deliberately left out cloud-only offerings, as I feel these are a different type of product from the databases you can install and maintain locally.

If you are looking for an RDF triple store, check the post for the full table.

I first saw this at SemanticWeb.com.

September 18, 2012

Mind maps just begging for RDF triples…. [human understanding = computer interpretation?]

Filed under: Mind Maps,RDF — Patrick Durusau @ 9:05 pm

Mind maps just begging for RDF triples and formal models by Kerstin Forsberg.

From the post:

Earlier this week CDISC English Speaking User Group (ESUG) Committee arranged a webinar: “CDISC SHARE – How SHARE is developing as a project/standard” with Simon Bishop, Standards and Operations Director, GSK. I did find the comprehensive presentation from Simon, and his colleuage Diana Wold, very interesting.

Interesting as the presentation in an excellent way exemplifies how “Current standards (company standards, SDTM standards, other standards) do not current deliver the capability we require” Also, I do find the presentation interesting as it exemplifies mind maps as a way forward as “Diagrams help us understand clinical processes and how this translates into datasets and variables.” (Quotes from slide 20 in the presentation: Conclusions.)

Below a couple of examples of mind maps from the presentation. And also, the background to my thinking that they are Mind maps just begging for RDF triples and formal models of the clinical and biomedical reality to make them fully ready “both for human understanding and for computer interpretation“.

Interesting post but the:

both for human understanding and for computer interpretation

is what caught my attention.

Always a good thing to improve the ability of computer’s to find things for us. To the extent RDF can do that, great!

But human understanding is far deeper and more complex than any computer, by RDF or other means, can achieve.

I think we need to keep the distinction between human understanding and computer interpretation firmly in mind.

I first saw this at the SemanticWeb.com.

September 8, 2012

So Long, and Thanks for All The Triples – OKG Shuts Down

Filed under: Google Knowledge Graph,RDF — Patrick Durusau @ 2:48 pm

So Long, and Thanks for All The Triples – OKG Shuts Down by Eric Franzon.

From the post:

I take no pleasure in being right.

Earlier this week, I speculated that the Open Knowledge Graph might be scaled back or shut down. This morning, I awoke to a post by the project’s creators, Thomas Steiner and Stefan Mirea announcing the closing of the OKG.

Eric and the original announcement both quote: Jack Menzel, Product Management Director at Google, as making the following statement:

“We try to make data as accessible as possible to people around the world, which is why we put as much data as as we can in Freebase. However there are a few reasons we can’t participate in your project.

First, the reason we can’t put all the data we have into Freebase is that we’ve acquired it from other sources who have not granted us the rights to redistribute. Much of the local and books data, for example, was given to us with terms that we would not immediately syndicate or provide it to others for free.

Other pieces of data are used, but only with attribution. For example, some data, like images, we feel comfortable using only in the context of search (as it is a preview of content that people will be finding with that search) and some data like statistics from the World Bank should only be shown with proper attribution.

With regards to automatic access to extract the ranking of the content: we block this kind of access to Google because our ranking is the proprietary core of what Google provides whenever you use search—users should access Google via the interfaces we provide.”

I can summarize that for you:

The Open Knowledge Graph (OKG) project is incompatible with the ad-driven business model of Google.

If you want the long version:

  • …not granted us the rights to redistribute.” Google engineered contracts for content that mandate its delivery/presentation via Google ad-driven interfaces. The “…not granted us…” language is always a tip off.
  • …but only with attribution.” That means the Google ad-driven interface as the means for attribution. Their choice you notice.
  • …ranking of content…block….” Probably the most honest part of the quote. Our facts, our revenue stream and we say no.

Illustrates a problem with ad-driven business models:

No ads, no revenue, which means you use our interfaces.

Value-add models avoid that but only with subscription models.

(Do you see another way?)

« Newer PostsOlder Posts »

Powered by WordPress