Archive for the ‘Linked Data’ Category

Putting Linked Data on the Map

Monday, May 13th, 2013

Putting Linked Data on the Map by Richard Wallis.

In fairness to Linked Data/Semantic Web, I really should mention this post by one of its more mainstream advocates:

Show me an example of the effective publishing of Linked Data – That, or a variation of it, must be the request I receive more than most when talking to those considering making their own resources available as Linked Data, either in their enterprise, or on the wider web.

There are some obvious candidates. The BBC for instance, makes significant use of Linked Data within its enterprise. They built their fantastic Olympics 2012 online coverage on an infrastructure with Linked Data at its core. Unfortunately, apart from a few exceptions such as Wildlife and Programmes, we only see the results in a powerful web presence. The published data is only visible within their enterprise.

Dbpedia is another excellent candidate. From about 2007 it has been a clear demonstration of Tim Berners-Lee’s principles of using URIs as identifiers and providing information, including links to other things, in RDF – it is just there at the end of the dbpedia URIs. But for some reason developers don’t seem to see it as a compelling example. Maybe it is influenced by the Wikipedia effect – interesting but built by open data geeks, so not to be taken seriously.

A third example, which I want to focus on here, is Ordnance Survey. Not generally known much beyond the geographical patch they cover, Ordnance Survey is the official mapping agency for Great Britain. Formally a government agency, they are best known for their incredibly detailed and accurate maps that are the standard accessory for anyone doing anything in the British countryside. A little less known is that they also publish information about post-code areas, parish/town/city/county boundaries, parliamentary constituency areas, and even European regions in Britain. As you can imagine, these all don’t neatly intersect, which makes the data about them a great case for a graph based data model and hence for publishing as Linked Data. Which is what they did a couple of years ago.

The reason I want to focus on their efforts now, is that they have recently beta released a new API suite, which I will come to in a moment. But first I must emphasise something that is often missed.

Linked Data is just there – without the need for an API the raw data (described in RDF) is ‘just there to consume’. With only standard [http] web protocols, you can get the data for an entity in their dataset by just doing a http GET request on the identifier…

(images omitted)

Richard does a great job describing the Linked Data APIs from the Ordnance Survey.

My only quibble is with his point:

Linked Data is just there – without the need for an API the raw data (described in RDF) is ‘just there to consume’.

True enough but it omits the authoring side of Linked Data.

Or understanding the data to be consumed.

With HTML, authoring hyperlinks was only marginally more difficult than “using” hyperlinks.

And the consumption of a hyperlink, beyond mime types, was unconstrained.

So linked data isn’t “just there.”

It’s there with an authoring burden that remains unresolved and that constrains consumption, should you decide to follow “standard [http] web protocols” and Linked Data.

I am sure the Ordnance Survey Linked Data and other Linked Data resources Richard mentions will be very useful, to some people in some contexts.

But pretending Linked Data is easier than it is, will not lead to improved Linked Data or other semantic solutions.

The ChEMBL database as linked open data

Thursday, May 9th, 2013

The ChEMBL database as linked open data by Egon L Willighagen, Andra Waagmeester, Ola Spjuth, Peter Ansell, Antony J Williams, Valery Tkachenko, Janna Hastings, Bin Chen and David J Wild. (Journal of Cheminformatics 2013, 5:23 doi:10.1186/1758-2946-5-23).

Abstract:

Background Making data available as Linked Data using Resource Description Framework (RDF) promotes integration with other web resources. RDF documents can natively link to related data, and others can link back using Uniform Resource Identifiers (URIs). RDF makes the data machine-readable and uses extensible vocabularies for additional information, making it easier to scale up inference and data analysis.

Results This paper describes recent developments in an ongoing project converting data from the ChEMBL database into RDF triples. Relative to earlier versions, this updated version of ChEMBL-RDF uses recently introduced ontologies, including CHEMINF and CiTO; exposes more information from the database; and is now available as dereferencable, linked data. To demonstrate these new features, we present novel use cases showing further integration with other web resources, including Bio2RDF, Chem2Bio2RDF, and ChemSpider, and showing the use of standard ontologies for querying.

Conclusions We have illustrated the advantages of using open standards and ontologies to link the ChEMBL database to other databases. Using those links and the knowledge encoded in standards and ontologies, the ChEMBL-RDF resource creates a foundation for integrated semantic web cheminformatics applications, such as the presented decision support.

You already know about the fragility of ontologies so no need to repeat that rant here.

Having material encoded with an ontology, on the other hand, after vetting, can be a source that you wrap with a topic map.

So all that effort isn’t lost.

Linked CSV (Encouraging Development)

Tuesday, May 7th, 2013

Linked CSV by Jeni Tennison.

Abstract:

Many open data sets are essentially tables, or sets of tables, which follow the same regular structure. This document describes a set of conventions for CSV files that enable them to be linked together and to be interpreted as RDF.

An encouraging observation in the draft:

Linked CSV is built around the concept of using URIs to name things. Every record, column, and even slices of data, in a linked CSV file is addressable using URI Identifiers for the text/csv Media Type. For example, if the linked CSV file is accessed at http://example.org/countries, the first record in the CSV file above, which happens to be the first data line within the linked CSV file (which describes Andorra) is addressable with the URI:

http://example.org/countries#row:0

However, this addressing merely identifies the records within the linked CSV file, not the entities that the record describes. This distinction is important for two reasons:

  • a single entity may be described by multiple records within the linked CSV file
  • addressing entities and records separately enables us to make statements about the source of the information within a particular record

By default, each data line describes an entity, each entity is described by a single data line, and there is no way to address the entities. However, adding a $id column enables entities to be given identifiers. These identifiers are always URIs, and they are interpreted relative to the location of the linked CSV file. The $id column may be positioned anywhere but by convention it should be the first column (unless there is a # column, in which case it should be the second). For example:

Hopefully Jeni is setting a trend in Linked Data circles of distinguishing locations from entities.

I first saw this in Christophe Lalanne’s A bag of tweets / April 2013.

Scientific Lenses over Linked Data… [Operational Equivalence]

Sunday, April 28th, 2013

Scientific Lenses over Linked Data: An approach to support task specifi c views of the data. A vision. by Christian Brenninkmeijer, Chris Evelo, Carole Goble, Alasdair J G Gray, Paul Groth, Steve Pettifer, Robert Stevens, Antony J Williams, and Egon L Willighagen.

Abstract:

Within complex scienti fic domains such as pharmacology, operational equivalence between two concepts is often context-, user- and task-specifi c. Existing Linked Data integration procedures and equivalence services do not take the context and task of the user into account. We present a vision for enabling users to control the notion of operational equivalence by applying scienti c lenses over Linked Data. The scientifi c lenses vary the links that are activated between the datasets which aff ects the data returned to the user.

Two additional quotes from this paper should convince you of the importance of this work:

We aim to support users in controlling and varying their view of the data by applying a scientifi c lens which govern the notions of equivalence applied to the data. Users will be able to change their lens based on the task and role they are performing rather than having one fixed lens. To support this requirement, we propose an approach that applies context dependent sets of equality links. These links are stored in a stand-off fashion so that they are not intermingled with the datasets. This allows for multiple, context-dependent, linksets that can evolve without impact on the underlying datasets and support diff ering opinions on the relationships between data instances. This flexibility is in contrast to both Linked Data and traditional data integration approaches. We look at the role personae can play in guiding the nature of relationships between the data resources and the desired a ffects of applying scientifi c lenses over Linked Data.

and,

Within scienti fic datasets it is common to fi nd links to the “equivalent” record in another dataset. However, there is no declaration of the form of the relationship. There is a great deal of variation in the notion of equivalence implied by the links both within a dataset’s usage and particularly across datasets, which degrades the quality of the data. The scienti fic user personae have very di fferent needs about the notion of equivalence that should be applied between datasets. The users need a simple mechanism by which they can change the operational equivalence applied between datasets. We propose the use of scientifi c lenses.

Obvious questions:

Does your topic map software support multiple operational equivalences?

Does your topic map interface enable users to choose “lenses” (I like lenses better than roles) to view equivalence?

Does your topic map software support declaring the nature of equivalence?

I first saw this in the slide deck: Scientific Lenses: Supporting Alternative Views of the Data by Alasdair J G Gray at: 4th Open PHACTS Community Workshop.

BTW, the notion of equivalence being represented by “links” reminds me of a comment Peter Neubauer (Neo4j) once made to me, saying that equivalence could be modeled as edges. Imagine typing equivalence edges. Will have to think about that some more.

4th Open PHACTS Community Workshop (slides) [Operational Equivalence]

Sunday, April 28th, 2013

4th Open PHACTS Community Workshop : Using the power of Open PHACTS

From the post:

The fourth Open PHACTS Community Workshop was held at Burlington House in London on April 22 and 23, 2013. The Workshop focussed on “Using the Power of Open PHACTS” and featured the public release of the Open PHACTS application programming interface (API) and the first Open PHACTS example app, ChemBioNavigator.

The first day featured talks describing the data accessible via the Open PHACTS Discovery Platform and technical aspects of the API. The use of the API by example applications ChemBioNavigator and PharmaTrek was outlined, and the results of the Accelrys Pipeline Pilot Hackathon discussed.

The second day involved discussion of Open PHACTS sustainability and plans for the successor organisation, the Open PHACTS Foundation. The afternoon was attended by those keen to further discuss the potential of the Open PHACTS API and the future of Open PHACTS.

During talks, especially those detailing the Open PHACTS API, a good number of signup requests to the API via dev.openphacts.org were received. The hashtag #opslaunch was used to follow reactions to the workshop on Twitter (see storify), and showed the response amongst attendees to be overwhelmingly positive.

This summary is followed by slides from the two days of presentations.

Not like being there but still quite useful.

As a matter of fact, I found a lead on “operational equivalence” with this data set. More to follow in a separate post.

Apache Marmotta (incubator)

Saturday, April 13th, 2013

Apache Marmotta (incubator)

From the webpage:

Apache Marmotta (incubator) is an Open Platform for Linked Data.

The goal of Apache Marmotta is to provide an open implementation of a Linked Data Platform that can be used, extended and deployed easily by organizations who want to publish Linked Data or build custom applications on Linked Data.

Right now the project is being setting up installed in the Apache Software Foundation infrastructure. The team is working to have available to download in the upcoming weeks the first release under incubator. Check the development section for further details how we work or subscribe to our mailing lists to follow the projects day to day.

Features

  • Read-Write Linked Data
  • RDF triple store with transactions, versioning and rule-base reasoning
  • SPARQL and LDPath query languages
  • Transparent Linked Data Caching
  • Integrated security mechanisms

Background

Marmotta comes as a continuation of the work in the Linked Media Framework project. LMF is an easy-to-setup server application that bundles some technologies such as Apache Stanbol or Apache Solr to offer some advanced services. After the release 2.6, the Read-Write Linked Data server code and some related libraries have been set aside to incubate Marmotta within the The Apache Software Foundation. LMF still keeps exactly the same functionallity, but now bundling Marmotta too.

If a client wants a Linked Data Platform, the least you can do is recommend one from Apache.

Linked Data and Law

Saturday, April 13th, 2013

Linked Data and Law

A listing of linked data and law resources maintained by the Legal Informatics Blog.

Most recently updated to reflect the availability of the Library of Congress classification K – Class Law Classifcation as linked data.

Law Classification Added to Library of Congress Linked Data Service

Saturday, April 13th, 2013

Law Classification Added to Library of Congress Linked Data Service by Kevin Ford.

From the post:

The Library of Congress is pleased to make the K ClassLaw Classification – and all its subclasses available as linked data from the LC Linked Data Service, ID.LOC.GOV. K Class joins the B, N, M, and Z Classes, which have been in beta release since June 2012. With about 2.2 million new resources added to ID.LOC.GOV, K Class is nearly eight times larger than the B, M, N, and Z Classes combined. It is four times larger than the Library of Congress Subject Headings (LCSH). If it is not the largest class, it is second only to the P Class (Literature) in the Library of Congress Classification (LCC) system.

We have also taken the opportunity to re-compute and reload the B, M, N, and Z classes in response to a few reported errors. Our gratitude to Caroline Arms for her work crawling through B, M, N, and Z and identifying a number of these issues.

Please explore the K Class for yourself at http://id.loc.gov/authorities/classification/K or all of the classes at http://id.loc.gov/authorities/classification.

The classification section of ID.LOC.GOV remains a beta offering. More work is needed not only to add the additional classes to the system but also to continue to work out issues with the data.

As always, your feedback is important and welcomed. Your contributions directly inform service enhancements. We are interested in all forms of constructive commentary on all topics related to ID. But we are particularly interested in how the data available from ID.LOC.GOV is used and continue to encourage the submission of use cases describing how the community would like to apply or repurpose the LCC data.

You can send comments or report any problems via the ID feedback form or ID listserv.

Not leisure reading for everyone but if you are interested, this is fascinating source material.

And an important source of information for potential associations between subjects.

I first saw this at: Ford: Law Classification Added to Library of Congress Linked Data Service.

The Project With No Name

Thursday, April 4th, 2013

Fujitsu Labs And DERI To Offer Free, Cloud-Based Platform To Store And Query Linked Open Data by Jennifer Zaino.

From the post:

The Semantic Web Blog reported last year about a relationship formed between the Digital Enterprise Research Institute (DERI) and Fujitsu Laboratories Ltd. in Japan, focused on a project to build a large-scale RDF store in the cloud capable of processing hundreds of billions of triples. At the time, Dr. Michael Hausenblas, who was then a DERI research fellow, discussed Fujitsu Lab’s research efforts related to the cloud, its huge cloud infrastructure, and its identification of Big Data as an important trend, noting that “Linked Data is involved with answering at least two of the three Big Data questions” – that is, how to deal with volume and variety (velocity is the third).

This week, the DERI and Fujitsu Lab partners have announced a new data storage technology that stores and queries interconnected Linked Open Data, to be available this year, free of charge, on a cloud-based platform. According to a press release about the announcement, the data store technology collects and stores Linked Open Data that is published across the globe, and facilitates search processing through the development of a caching structure that is specifically adapted to LOD.

Typically, search performance deteriorates when searching for common elements that are linked together within data because of requirements around cross-referencing of massive data sets, the release says. The algorithm it has developed — which takes advantage of links in LOD link structures typically being concentrated in only a portion of server nodes, and of past usage frequency — caches only the data that is heavily accessed in cross-referencing to reduce disk accesses, and so accelerate searching.

Not sure what it means for the project between DERI and Fujitsu to have no name. Or at least no name in the press releases.

Until that changes, may I suggest: DERI and Fujitsu Project With No Name (DFPWNN)? ;-)

With or without a name I was glad for DERI because, well, I like research and they do it quite well.

DFPWNN’s better query technology for LOD will demonstrate, in my opinion, the same semantic diversity found at Swoogle.

Linking up semantically diverse content means just that, a lot of semantically diverse content, linked up.

The bill for leaving semantic diversity as a problem to address “later” is about to come due.

Tensors and Their Applications…

Saturday, March 23rd, 2013

Tensors and Their Applications in Graph-Structured Domains by Maximilian Nickel and Volker Tresp. (Slides.)

Along with the slides, you will like abstract and bibliography found at: Machine Learning on Linked Data: Tensors and their Applications in Graph-Structured Domains.

Abstract:

Machine learning has become increasingly important in the context of Linked Data as it is an enabling technology for many important tasks such as link prediction, information retrieval or group detection. The fundamental data structure of Linked Data is a graph. Graphs are also ubiquitous in many other fields of application, such as social networks, bioinformatics or the World Wide Web. Recently, tensor factorizations have emerged as a highly promising approach to machine learning on graph-structured data, showing both scalability and excellent results on benchmark data sets, while matching perfectly to the triple structure of RDF. This tutorial will provide an introduction to tensor factorizations and their applications for machine learning on graphs. By the means of concrete tasks such as link prediction we will discuss several factorization methods in-depth and also provide necessary theoretical background on tensors in general. Emphasis is put on tensor models that are of interest to Linked Data, which will include models that are able to factorize large-scale graphs with millions of entities and known facts or models that can handle the open-world assumption of Linked Data. Furthermore, we will discuss tensor models for temporal and sequential graph data, e.g. to analyze social networks over time.

Devising a system to deal with the heterogeneous nature of linked data.

Just skimming the slides I could see, this looks very promising.

I first saw this in a tweet by Stefano Bertolo.


Update: I just got an email from Maximilian Nickel and he has altered the transition between slides. Working now!

From slide 53 forward is pure gold for topic map purposes.

Heavy sledding but let me give you one statement from the slides that should capture your interest:

Instance matching: Ranking of entities by their similarity in the entity-latent-component space.

Although written about linked data, not limited to linked data.

What is more, Maximilian offers proof that the technique scales!

Complex, configurable, scalable determination of subject identity!

[Update: deleted note about issues with slides, which read: (Slides for ISWC 2012 tutorial, Chrome is your best bet. Even better bet, Chrome on Windows. Chrome on Ubuntu crashed every time I tried to go to slide #15. Windows gets to slide #46 before failing to respond. I have written to inquire about the slides.)]

Beacons of Availability

Sunday, March 17th, 2013

From Records to a Web of Library Data – Pt3 Beacons of Availability by Richard Wallis.

Beacons of Availability

As I indicated in the first of this series, there are descriptions of a broader collection of entities, than just books, articles and other creative works, locked up in the Marc and other records that populate our current library systems. By mining those records it is possible to identify those entities, such as people, places, organisations, formats and locations, and model & describe them independently of their source records.

As I discussed in the post that followed, the library domain has often led in the creation and sharing of authoritative datasets for the description of many of these entity types. Bringing these two together, using URIs published by the Hubs of Authority, to identify individual relationships within bibliographic metadata published as RDF by individual library collections (for example the British National Bibliography, and WorldCat) is creating Library Linked Data openly available on the Web.

Why do we catalogue? is a question, I often ask, with an obvious answer – so that people can find our stuff. How does this entification, sharing of authorities, and creation of a web of library linked data help us in that goal. In simple terms, the more libraries can understand what resources each other hold, describe, and reference, the more able they are to guide people to those resources. Sounds like a great benefit and mission statement for libraries of the world but unfortunately not one that will nudge the needle on making library resources more discoverable for the vast majority of those that can benefit from them.

I have lost count of the number of presentations and reports I have seen telling us that upwards of 80% of visitors to library search interfaces start in Google. A similar weight of opinion can be found that complains how bad Google, and the other search engines, are at representing library resources. You will get some balancing opinion, supporting how good Google Book Search and Google Scholar are at directing students and others to our resources. Yet I am willing to bet that again we have another 80-20 equation or worse about how few, of the users that libraries want to reach, even know those specialist Google services exist. A bit of a sorry state of affairs when the major source of searching for our target audience, is also acknowledged to be one of the least capable at describing and linking to the resources we want them to find!

Library linked data helps solve both the problem of better description and findability of library resources in the major search engines. Plus it can help with the problem of identifying where a user can gain access to that resource to loan, download, view via a suitable license, or purchase, etc.

I’m am an ardent sympathizer helping people to find “our stuff.”

I don’t disagree with the description of Google as: “…the major source of searching for our target audience, is also acknowledged to be one of the least capable at describing and linking to the resources we want them to find!”

But in all fairness to Google, I would remind you of Drabenstott’s research that found for the Library of Congress subject headings:

Overall percentages of correct meanings for subject headings in the original order of subdivisions were as follows:

children 32%
adults 40%
reference 53%
technical services librarians 56%

The Library of Congress subject classification has been around for more than a century and just over half of the librarians can use it correctly.

Let’s don’t wait more than a century to test the claim:*

Library linked data helps solve both the problem of better description and findability of library resources in the major search engines.


* By “test” I don’t mean the sort of study, “…we recruited twelve LIS students but one had to leave before the study was complete….”

I am using “test” in the sense of a well designed and organized social science project with professional assistance from social scientists, UI test designers and the like.

I think OCLC is quite sincere in its promotion of linked data, but effectiveness is an empirical question, not one of sincerity.

From Records to a Web of Library Data – Pt2 Hubs of Authority

Saturday, March 16th, 2013

From Records to a Web of Library Data – Pt2 Hubs of Authority by Richard Wallis.

From the post:

Hubs of Authority

Libraries, probably because of their natural inclination towards cooperation, were ahead of the game in data sharing for many years. The moment computing technology became practical, in the late sixties, cooperative cataloguing initiatives started all over the world either in national libraries or cooperative organisations. Two from personal experience come to mind, BLCMP started in Birmingham, UK in 1969 eventually evolved in to the leading Semantic Web organisation Talis, and in 1967 Dublin, Ohio saw the creation of OCLC. Both in their own way having had significant impact on the worlds of libraries, metadata, and the web (and me!).

One of the obvious impacts of inter-library cooperation over the years has been the authorities, those sources of authoritative names for key elements of bibliographic records. A large number of national libraries have such lists of agreed formats for author and organisational names. The Library of Congress has in addition to its name authorities, subjects, classifications, languages, countries etc. Another obvious success in this area is VIAF, the Virtual International Authority File, which currently aggregates over thirty authority files from all over the world – well used and recognised in library land, and increasingly across the web in general as a source of identifiers for people & organisations.

These, Linked Data enabled, sources of information are developing importance in their own right, as a natural place to link to, when asserting the thing, person, or concept you are identifying in your data. As Sir Tim Berners-Lee’s fourth principle of Linked Data tells us to “Include links to other URIs. so that they can discover more things”. VIAF in particular is becoming such a trusted, authoritative, source of URIs that there is now a VIAFbot responsible for interconnecting Wikipedia and VIAF to surface hundreds of thousands of relevant links to each other. A great hat-tip to Max Klein, OCLC Wikipedian in Residence, for his work in this area.

I don’t deny that VIAF is a very useful tool but if you search for personal name, “Marilyn Monroe,” it returns:

1. Miller, Arthur, 1915-2005
National Library of Australia National Library of the Czech Republic National Diet Library (Japan) Deutsche Nationalbibliothek RERO (Switzerland) SUDOC (France) Library and Archives Canada National Library of Israel (Latin) National Library of Sweden NUKAT Center (Poland) Bibliothèque nationale de France Biblioteca Nacional de España Library of Congress/NACO

Miller, Arthur (Arthur Asher), 1915-2005
National Library of the Netherlands-test

Miller, Arthur, 1915-
Vatican Library Biblioteca Nacional de Portugal

ميلر، ارثر، 1915-2005 م.
Bibliotheca Alexandrina (Egypt)

Miller, Arthur
Wikipedia (en)-test

מילר, ארתור, 1915-2005
National Library of Israel (Hebrew)

2. Monroe, Marilyn, 1926-1962
National Library of Israel (Latin) National Library of the Czech Republic National Diet Library (Japan) Deutsche Nationalbibliothek SUDOC (France) Library and Archives Canada National Library of Australia National Library of Sweden NUKAT Center (Poland) Bibliothèque nationale de France Biblioteca Nacional de España Library of Congress/NACO

Monroe, Marilyn
National Library of the Netherlands-test Wikipedia (en)-test RERO (Switzerland)

Monroe, Marilyn American actress, model, and singer, 1926-1962
Getty Union List of Artist Names

Monroe, Marilyn, pseud.
Biblioteca Nacional de Portugal

3. DiMaggio, Joe, 1914-1999
Library of Congress/NACO Bibliothèque nationale de France

Di Maggio, Joe 1914-1999
Deutsche Nationalbibliothek

Di Maggio, Joseph Paul, 1914-1999
National Diet Library (Japan)

DiMaggio, Joe, 1914-
National Library of Australia

Dimaggio, Joseph Paul, 1914-1999
SUDOC (France)

DiMaggio, Joe (Joseph Paul), 1914-1999
National Library of the Netherlands-test

Dimaggio, Joe
Wikipedia (en)-test

4. Monroe, Marilyn
Deutsche Nationalbibliothek

5. Hurst-Monroe, Marlene
Library of Congress/NACO

6. Wolf, Marilyn Monroe
Deutsche Nationalbibliothek

Maybe Sir Tim is right, users “…can discover more things.”

Some of them are related, some of them are not.

From Records to a Web of Library Data – Pt1 Entification

Saturday, March 16th, 2013

From Records to a Web of Library Data – Pt1 Entification by Richard Wallis.

From the post:

Entification

Entification – a bit of an ugly word, but in my day to day existence one I am hearing more and more. What an exciting life I lead…

What is it, and why should I care, you may be asking.

I spend much of my time convincing people of the benefits of Linked Data to the library domain, both as a way to publish and share our rich resources with the wider world, and also as a potential stimulator of significant efficiencies in the creation and management of information about those resources. Taking those benefits as being accepted, for the purposes of this post, brings me into discussion with those concerned with the process of getting library data into a linked data form.

As you know, I am far from convinced about the “benefits” of Linked Data, at least with its current definition.

Who knows what definition “Linked Data” may have in some future vision of the W3C? (URL Homonym Problem: A Topic Map Solution, a tale of how the W3C decided to redefine URL.)

But Richard’s point about the ugliness and utility of “entification” is well taken.

So long as you remember that every term can be described “in terms of other things.”

There are no primitive terms, not one.

Linked Data Platform 1.0 (W3C)

Thursday, March 7th, 2013

Linked Data Platform 1.0 (W3C)

Abstract:

A set of best practices and simple approach for a read-write Linked Data architecture, based on HTTP access to web resources that describe their state using the RDF data model.

Just in case you every encounter such a platform.

Linked Data for Holdings and Cataloging

Monday, February 25th, 2013

From the ALA Midwinter Meeting:

Linked Data for Holdings and Cataloging: The First Step Is Always the Hardest! by Eric Miller (Zepheira) and Richard Wallis (OCLC). (Video + Slides)

Linked Data for Holdings and Cataloging: Interactive Session. (Audio)

Since linked data wasn’t designed for human users, the advantage for library catalogs isn’t clear.

Most users can’t use LCSH so perhaps the lack of utility will go unnoticed. (Subject Headings and the Semantic Web)

I first saw this at: Linked Data for Holdings and Cataloging – recordings now available!

NBA Stats Like Never Before [No RDF/Linked Data/Topic Maps In Sight]

Saturday, February 16th, 2013

NBA Stats Like Never Before by Timo Elliott.

From the post:

The National Baseball Association today unveiled a new site for fans of games statistics: NBA.com/stats, powered by SAP Analytics technology. The multi-year marketing partnership between SAP and the NBA was announced six months ago:

“We are constantly researching new and emerging technologies in an effort to provide our fans with new ways to connect with our game,” said NBA Commissioner David Stern. “SAP is a leader in providing innovative software solutions and an ideal partner to provide a dynamic and comprehensive statistical offering as fans interact with NBA basketball on a global basis.”

“SAP is honored to partner with the NBA, one of the world’s most respected sports organizations,” said Bill McDermott, co-CEO, SAP. “Through SAP HANA, fans will be able to experience the NBA as never before. This is a slam dunk for SAP, the NBA and the many fans who will now have access to unprecedented insight and analysis.”

The free database contains every box score of every game played since the league’s inception in 1946, including graphical displays of players shooting tendencies.

To the average fan NBA.com/Stats delivers information that is of immediate interest to them, not their computers.

Another way to think about it:

Computers don’t make purchasing decisions, users do.

Something to think about when deciding on your next semantic technology.

Saving the “Semantic” Web (part 2) [NOTLogic]

Monday, February 11th, 2013

Expressing Your Semantics: NOTLogic

Saving the “Semantic” Web (part 1) ended concluding authors of data/content should be asked about the semantics of their content.

I asked if there were compelling reasons to ask someone else and got no takers.

The acronym, NOTLogic may not be familiar. It expands to: Not Only Their Logic.

Users should express their semantics in the “logic” of their domain.

After all, it is their semantics, knowledge and domain that are being captured.

Their “logic” may not square up with FOL (first order logic) but where’s the beef?

Unless one of the project requirements is to maintain consistency with FOL, why bother?

The goal in most BI projects is ROI on capturing semantics, not adhering to FOL for its own sake.

Some people want to teach calculators how to mimic “reasoning” by using that subset known as “logic.”

However much I liked the Friden rotary calculator of my youth:

Calculator

teaching it to mimic “reasoning” isn’t going to happen on my dime.

What about yours?

There are cases where machine learning technique are very productive and fully justified.

The question you need to ask yourself (after discovering if you should be using RDF at all, The Semantic Web Is Failing — But Why? (Part 2)) is whether “their” logic works for your use case.

I suspect you will find that you can express your semantics, including relationships, without resort to FOL.

Which may lead you to wonder: Why would anyone want you to use a technique they know, but you don’t?

I don’t know for sure but have some speculations on that score I will share with you tomorrow.

In the mean time, remember:

  1. As the author of content or data, you are the person to ask about its semantics.
  2. You should express your semantics in a way comfortable for you.

AGROVOC 2013 edition released

Monday, February 11th, 2013

AGROVOC 2013 edition released

From the post:

The AGROVOC Team is pleased to announce the release of the AGROVOC 2013 edition.

The updated version contains 32,188 concepts in up to 22 languages, resulting in a total of 626,211 terms (in 2012: 32,061 concepts, 625,096 terms).

Please explore AGROVOC by searching terms, or browsing hierarchies.

AGROVOC 2013 is available for download, and accessible via web services.

From the “about” page:

The AGROVOC thesaurus contains 32,188 concepts in up to 22 languages covering topics related to food, nutrition, agriculture, fisheries, forestry, environment and other related domains.

A global community of editors consisting of librarians, terminologists, information managers and software developers, maintain AGROVOC using VocBench, an open-source multilingual, web-based vocabulary editor and workflow management tool that allows simultaneous, distributed editing. AGROVOC is expressed in Simple Knowledge Organization System (SKOS) and published as Linked Data.

Need some seeds for your topic map in “…food, nutrition, agriculture, fisheries, forestry, environment and other related domains”?

The Semantic Web Is Failing — But Why? (Part 3)

Thursday, February 7th, 2013

Is Linked Data the Answer?

Leaving the failure of users to understand RDF semantics to one side, there is also the issue of the complexity of its various representations.

Consider Kingsley Idehem’s “simple” example Turtle document, which he posted in: Simple Linked Data Deployment via Turtle Docs using various Storage Services:

##### Starts Here #####
# Note: the hash is a comment character in Turtle
# Content start
# You can save this to a local file. In my case I use Local File Name: kingsley.ttl .
# Actual Content:

# prefix decalaration that enable the use of compact identifiers instead of fully expanded
# HTTP URIs.

@prefix owl:   .
@prefix foaf:  .
@prefix rdfs:  .
@prefix wdrs:  .
@prefix opl:  .
@prefix cert:  .
@prefix:<#>.

# Profile Doc Stuff

<> a foaf:Document .
<> rdfs:label "DIY Linked Data Doc About: kidehen" .
<> rdfs:comment "Simple Turtle File That Describes Entity: kidehen " .

# Entity Me Stuff

<> foaf:primaryTopic :this .
<> foaf:maker :this .
:this a foaf:Person .
:this wdrs:describedby <> .
:this foaf:name "Kingsley Uyi Idehen" .
:this foaf:firstName "Kingsley" .
:this foaf:familyName "Idehen" .
:this foaf:nick "kidehen" .
:this owl:sameAs  .
:this owl:sameAs  .
:this owl:sameAs  .
:this owl:sameAs  .
:this foaf:page  .
:this foaf:page  .
:this foaf:page  .
:this foaf:page  .
:this foaf:knows , , , , ,  .

# Entity Me: Identity & WebID Stuff 

#:this cert:key :pubKey .
#:pubKey a cert:RSAPublicKey;
# Public Key Exponent
# :pubkey cert:exponent "65537" ^^ xsd:integer;
# Public Key Modulus
# :pubkey cert:modulus "d5d64dfe93ab7a95b29b1ebe21f3cd8a6651816c9c39b87ec51bf393e4177e6fc
2ee712d92caf9d9f1423f5e65f127274529a2e6cc53f1e452c6736e8db8732f919c4160eaa9b6f327c8617c
40036301b547abfc4c5de610780461b269e3d8f8e427237da6152ac2047d88ff837cddae793d15427fa7ce
067467834663737332be467eb353be678bffa7141e78ce3052597eae3523c6a2c414c2ae9f8d7be807bb3
fc0d516b8ecd2fafee4f20ff3550919601a0ad5d29126fb687c2e8c156f04918a92c4fc09f136473f3303814e1
83185edf0046e124e856ca7ada027345e614f8d665f5d7172d880497005ff4626c2b0f2206f7dce717e4f279
dd2a0ddf04b" ^^ xsd:hexBinary .

# :this opl:hasCertificate :cert .
# :cert opl:fingerprint "640F9DD4CFB6DD6361CBAD12C408601E2479CC4A" ^^ xsd:hexBinary;
#:cert opl:hasPublicKey "d5d64dfe93ab7a95b29b1ebe21f3cd8a6651816c9c39b87ec51bf393e4177e6fc2
ee712d92caf9d9f1423f5e65f127274529a2e6cc53f1e452c6736e8db8732f919c4160eaa9b6f327c8617c400
36301b547abfc4c5de610780461b269e3d8f8e427237da6152ac2047d88ff837cddae793d15427fa7ce06746
7834663737332be467eb353be678bffa7141e78ce3052597eae3523c6a2c414c2ae9f8d7be807bb3fc0d516b
8ecd2fafee4f20ff3550919601a0ad5d29126fb687c2e8c156f04918a92c4fc09f136473f3303814e183185edf00
46e124e856ca7ada027345e614f8d665f5d7172d880497005ff4626c2b0f2206f7dce717e4f279dd2a0ddf04b"
^^ xsd:hexBinary .

### Ends or Here###

Try handing that “simple” example and Idehem’s article to some non-technical person in your office to gauge its “simplicity.”

For that matter, hand it to some of your technical but non-Semantic Web folks as well.

Your experience with that exercise will speak louder than anything I can say.


The next series starts with Saving the “Semantic” Web (Part 1)

BBC …To Explore Linked Data Technology [Instead of hand-curated content management]

Friday, February 1st, 2013

BBC News Lab to Explore Linked Data Technology by Angela Guess.

From the post:

Matt Shearer of the BBC recently reported that the BBC’s News Lab team will begin exploring linked data technologies. He writes, “Hi I’m Matt Shearer, delivery manager for Future Media News. I manage the delivery of the News Product and I also lead on BBC News Labs. BBC News Labs is an innovation project which was started during 2012 to help us harness the BBC’s wider expertise to explore future opportunities. Generally speaking BBC News believes in allowing creative technologists to innovate and influence the direction of the News product. For example the delivery of BBC News’ responsive design mobile service started in 2011 when we made space for a multidiscipline project to explore responsive design opportunities for BBC News. With this in mind the BBC News team setup News Labs to explore linked data technologies.”

Shearer goes on, “The BBC has been making use of linked data technologies in its internal content production systems since 2011. As explained by Jem Rayfield this enabled the publishing of news aggregation pages ‘per athlete’, ‘per sport’ and ‘per event’ for the 2012 Olympics – something that would not have been possible with hand-curated content management. Linked data is being rolled out on BBC News from early 2013 to enrich the connections between BBC News stories, content assets, the wider BBC website and the World Wide Web. We framed each challenge/opportunity for the News Lab in terms of a clear ‘problem space’ (as opposed to a set of requirements that may limit options) supported by research findings, audience needs, market needs, technology opportunities and framed with the BBC News Strategy.”

Read more here.

(emphasis added)

Apologies for the long quote but I wanted to capture the BBC’s comparison of using linked data to hand-curated content management in context.

I never dreamed the BBC was still using “hand-curated content management” as a measure of modern IT systems.

Quite remarkable.

On the other hand, perhaps they were being kind to the linked data experiment by using a measure that enables it to excel.

If you know which one, please comment.

Thanks!

Is Linked Data the future of data integration in the enterprise?

Tuesday, January 15th, 2013

Is Linked Data the future of data integration in the enterprise? by John Walker.

From the post:

Following the basic Linked Data principles we have assigned HTTP URIs as names for things (resources) providing an unambiguous identifier. Next up we have converted data from a variety of sources (XML, CSV, RDBMS) into RDF.

One of the key features of RDF is the ability to easily merge data about a single resource from multiple source into a single “supergraph” providing a more complete description of the resource. By loading the RDF into a graph database, it is possible to make an endpoint available which can be queried using the SPARQL query language. We are currently using Dydra as their cloud-based database-as-a-service model provides an easy entry route to using RDF without requiring a steep learning curve (basically load your RDF and you’re away), but there are plenty of other options like Apache Jena and OpenRDF Sesame. This has made it very easy for us to answer to complex questions requiring data from multiple sources, moreover we can stand up APIs providing access to this data in minutes.

By using a Linked Data Plaform such as Graphity we can make our identifiers (HTTP URIs) dereferencable. In layman’s terms when someone plugs the URI into a browser, we provide a description of the resource in HTML. Using content negotiation we are able to provide this data in one of the standard machine-readable XML, JSON or Turtle formats. Graphity uses Java and XSLT 2.0 which our developers already have loads of experience with and provides powerful mechanisms with which we will be able to develop some great web apps.

What do you make of:

One of the key features of RDF is the ability to easily merge data about a single resource from multiple source into a single “supergraph” providing a more complete description of the resource.

???

I suppose if by some accident we all use the same URI as an identifier, that would be the case. But that hardly requires URIs, Linked Data or RDF.

Scientific conferences on digital retrieval the 1950′s worried about diversity of nomenclature being barriers to discovery of resources. If we haven’t addressed the semantic diversity issue in sixty (60) years of talking about it, it isn’t clear how creating another set of diverse names is going to help.

There may be other reasons for using URIs but seamless merging doesn’t appear to be one of them.

Moreover, how do I know what you have identified with a URI?

You can return one or more properties for a URI, but which ones matter for the identity of the subject it identifies?

I first saw this at Linked Data: The Future of Data Integration by Angela Guess.

@AMS Webinars on Linked Data

Wednesday, January 9th, 2013

@AMS Webinars on Linked Data

From the website:

The traditional approach of sharing data within silos seems to have reached its end. From governments and international organizations to local cities and institutions, there is a widespread effort of opening up and interlinking their data. Linked Data, a term coined by Tim Berners-Lee in his design note regarding the Semantic Web architecture, refers to a set of best practices for publishing, sharing, and interlinking structured data on the Web.

Linked Open Data (LOD), a concept that has leapt onto the scene in the last years, is Linked Data distributed under an open license that allows its reuse for free. Linked Open Data becomes a key element to achieve interoperability and accessibility of data, harmonisation of metadata and multilinguality.

There are four remaining seminars in this series:

Webinar in French | 22nd January 2013 – 11:00am Rome time
Clarifiez le sens de vos données publiques grâce au Web de données
Christophe Guéret, Royal Netherlands Academy of Arts and Sciences, Data Archiving and Networked Services (DANS)

Webinar in Chinese | 29th January 2013 – 02:00am Rome time
基于网络的研讨会 “题目:理解和利用关联数据 --图情档博(LAM)作为关联数据的提供者和使用者”
Marcia Zeng, School of Library and Information Science, Kent State University

Webinar in Russian | 5th February 2013 – 11:00am Rome time
Введение в концепцию связанных открытых данных
Irina Radchenko, Centre of Semantic Technologies, Higher School of Economics

Webinar in Arabic | 12th February 2013 – 11:00am Rome time
Ibrahim Elbadawi, UAE Federal eGovernment

Mark your agenda! New Free Webinars @ AIMS on Linked Open Data for registration and more details.

Callimachus Version 1.0

Friday, January 4th, 2013

Callimachus Version 1.0 by Eric Franzon.

From the post:

The Callimachus Project has announced that the latest release of the Open Source version of Callimachus is available for immediate download.

Callimachus began as a linked data management system in 2009 and is an Open Source system for navigating, managing, visualizing and building applications on Linked Data.

Version 1.0 introduces several new features, including:

  • Built-in support for most types of Persistent URLs (PURLs), including Active PURLs.
  • Scripted HTTP content type conversions via XProc pipelines.
  • Ability to access remote Linked Data via SPARQL SERVICE keyword and XProc pipelines.
  • Named Queries can now have a custom view page. The view page can be a template for the resources in the query result.
  • Authorization can now be performed based on IP addresses or the DNS domain of the client.

10 Rules for Persistent URIs [Actually only one] Present of Persistent URIs

Monday, December 24th, 2012

Interoperability Solutions for European Public Administrations got into the egg nog early:

D7.1.3 – Study on persistent URIs, with identification of best practices and recommendations on the topic for the MSs and the EC (PDF) (I’m not kidding, go see for yourself.)

Five (5) positive rules:

  1. Follow the pattern: http://(domain)/(type)/(concept)/(reference)
  2. Re-use existing identifiers
  3. Link multiple representations
  4. Implement 303 redirects for real-world objects
  5. Use a dedicated servive

Five (5) negative rules:

  1. Avoid stating ownership
  2. Avoid version numbers
  3. Avoid using auto-increment
  4. Avoid query strings
  5. Avoid file extensions

If the goal is “persistent” URIs, only the “Use a dedicated server” has any relationship to making a URIs “persistent.”

That is that five (5) or ten (10) years from now, a URI used as an identifier will return the same value as today.

The other nine rules have no relationship to persistence. Good arguments can be made for some of them, but persistence isn’t one of them.

Why the report hides behind the rhetoric of persistence I cannot say.

But you can satisfy yourself that only a “dedicated server” can persist a URI, whatever its form.

W3C confusion over identifiers and locators for web resources continues to plague this area.

There isn’t anything particularly remarkable about using a URI as an identifier. So long as it is understood that URI identifiers are just like any other identifier.

That is they can be indexed, annotated, searched for and returned to users with data about the object of the identification.

Viewed that way, that once upon a time there was a resource with the location specified by a URI, has little or nothing to do with the persistent of that URI.

So long as we have indexed the URI, that index can serve as a resolution of that URI/identifier for as long as the index persists. With additional information should we choose to create and provide it.

The EU document concedes as much when it says:

Without exception, all the use cases discussed in section 3 where a policy of URI persistence has been adopted, have used a dedicated service that is independent of the data originator. The Australian National Data Service uses a handle resolver, Dublin Core uses purl.org, services, data.gov.uk and publications.europa.eu are all also independent of a specific government department and could readily be transferred and run by someone else if necessary. This does not imply that a single service should be adopted for multiple data providers. On the contrary – distribution is a key advantage of the Web. It simply means that the provision of persistent URIs should be independent of the data originator.

That is if you read: “…independent of the data originator” to mean independent of a particular location on the WWW.

No changes in form, content, protocols, server software, etc., required. And you get persistent URIs.

Merry Christmas to all and to all…, persistent URIs as identifiers (not locators)!

(I first saw this at: New Report: 10 Rules for Persistent URIs)

Best Buy Product Catalog via Semantic Endpoints

Thursday, December 20th, 2012

Announcing BBYOpen Metis Alpha: Best Buy Product Catalog via Semantic Endpoints

From the post:

Announcing BBYOpen Metis Alpha: Best Buy Product Catalog via Semantic Endpoints

These days, consumers have a rich variety of products available at their fingertips. A massive product landscape has evolved, but sadly products in this enormous and rich landscape often get flattened to just a price tag. Over time, it seems the product value proposition, variety, descriptions, specifics, and details that make up products have all but disappeared. This presents consumers with a "paradox of choice" where misinformed decisions can lead to poor product selections, and ultimately product returns and customer remorse.

To solve this problem, BBY Open is excited to announce the first phase Alpha release of Metis, our semantically-driven product insight engine. As part of a phased release approach, this first release consists of publishing all 500K+ of our active Best Buy products with reviews as RDF-enabled endpoints for public consumption.

This alpha release is the first phase in solving this product ambiguity. With the publishing of structured product data in RDF format using industry accepted product ontologies like GoodRelations, standards from the Semantic Web group at the W3C, and the NetKernel platform, the Metis Alpha gives developers the ability to consume and query structured data via SPARQL (get up to speed with Learning SPARQL by Bob DuCharme), enabling the discovery of insight hidden deep inside the product catalog.

Comments?

Linked Jazz

Thursday, December 13th, 2012

Linked Jazz

Network display of Jazz artists with a number of display options.

Using Linked Data.

Better network display than I am accustomed to and I know that Lars likes jazz. ;-)

I first saw this in a tweet by Christophe Viau.

PS: You may also like the paper: Visualizing Linked Jazz: A web-based tool for social network analysis and exploration.

Optique

Thursday, December 13th, 2012

Optique

From the homepage:

Scalable end-user access to Big Data is critical for e ffective data analysis and value creation. Optique will bring about a paradigm shift for data access:

  • by providing a semantic end-to-end connection between users and data sources;
  • enabling users to rapidly formulate intuitive queries using familiar vocabularies and conceptualisations;
  • seamlessly integrating data spread across multiple distributed data sources, including streaming sources;
  • exploiting massive parallelism for scalability far beyond traditional RDBMSs and thus reducing the turnaround time for information requests to minutes rather than days.

Another new EU data project.

Website reports first software will be available towards the end of 2013.

Not much in the way of specifics but it is very early in the project.

Can anyone point me to a public version of their funding application?

I have been given to understand that funding applications have more detail that may appear in public announcements.

PS: I had trouble downloading a presentation by Peter Haase that is cited on the website so when I obtained it, I uploaded a local copy: On Demand Access to Big Data Through Semantic Technologies. (PDF)

I have seen the Linked Data cloud illustration many times. Have you seen it in comparison with the overall data cloud?

Developing CODE for a Research Database

Tuesday, December 11th, 2012

Developing CODE for a Research Database by Ian Armas Foster.

From the post:

The fact that there are a plethora of scientific papers readily available online would seem helpful to researchers. Unfortunately, the truth is that the volume of these articles has grown such that determining which information is relevant to a specific project is becoming increasingly difficult.

Austrian and German researchers are thus developing CODE, or Commercially Empowered Linked Open Data Ecosystems in Research, to properly aggregate research data from its various forms, such as PDFs of academic papers and data tables upon which those papers are based, into a single system. The project is in a prototype stage, with the goal being to integrate all forms into one platform by the project’s second year.

The researchers from the University of Passau in Germany and the Know-Center in Graz, Austria explored the challenges to CODE and how the team intends to deal with those challenges in this paper. The goal is to meliorate the research process by making it easier to not only search for both text and numerical data in the same query but also to use both varieties in concert. The basic architecture for the project is shown below.

Stop me if you have heard this one before: “There was this project that was going to disambiguate entities and create linked data….”

I would be the first one to cheer if such a project were successful. But, a few paragraphs in a paper, given the long history of entity resolution and its difficulties, isn’t enough to raise my hopes.

You?

Library Hi Tech Journal seeks papers on LOV & LOD

Saturday, December 8th, 2012

Library Hi Tech Journal seeks papers on LOV & LOD

From the post:

Library Hi Tech (LHT) seeks papers about new works, initiatives, trends and research in the field of linking and opening vocabularies. This call for papers is inspired by the 2012 LOV Symposium: Linking and Opening Vocabularies symposium and SKOS-2-HIVE —Helping Interdisciplinary Vocabulary Engineering workshop—held at the Universidad Carlos III de Madrid (UC3M).

This Library Hi Tech special issue might include papers delivered at the UC3M-LOV events and other original works related with this subject, not yet published.

Topics: LOV & LOD

Papers specifically addressing research and development activities, implementation challenges and solutions, and educative aspects of Linked Open Vocabularies (LOV) and/or in a broader sense Linked Open Data, are of particular interest.

Those interested in submitting an article should send papers before 30 January 2013. Full articles should be between 4,000 and 8,000 words. References should use the Harvard style. Please submit completed articles via the Scholar One online submission system. All final submissions will be peer reviewed.

On the style for references, you may find the Author Guidelines at LHT useful.

More generally, see Harvard System, posted by the University Library of Anglia Ruskin University.