Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

September 8, 2011

Press.net News Ontologies & rNews

Filed under: Linked Data,LOD,Ontology — Patrick Durusau @ 5:58 pm

Press.net News Ontologies

From the webpage:

The news ontology is comprised of several ontologies, which describe assets (text, images, video) and the events and entities (people, places, organisations, abstract concepts etc.) that appear in news content. The asset model is the representation of news content as digital assets created by a news provider (e.g. text, images, video and data such as csv files). The domain model is the representation of the ‘real world’ which is the subject of news. There are simple entities, which we have labelled with the amorphous term of ‘stuff‘ and complex entities. Currently, the only complex entity the ontology is concerned with is events. The term stuff has been used to include abstract and intangible concepts (e.g. Feminism, Love, Hate Crime etc.) as well as tangible things (e.g. Lord Ashdown, Fiat Punto, Queens Park Rangers).

Assets (news content) are about things in the world (the domain model). The connection between assets and the entities that appear in them is made using tags. Assets are further holistically categorised using classification schemes (e.g. IPTC Media Topic Codes, Schema.org Vocabulary or Press Association Categorisations).

No sooner had I seen that on the LOD list, than Stephanie Corlosquet pointed out rNews, another ontology for news.

From the rNews webpage:

rNews is a proposed standard for using RDFa to annotate news-specific metadata in HTML documents. The rNews proposal has been developed by the IPTC, a consortium of the world’s major news agencies, news publishers and news industry vendors. rNews is currently in draft form and the IPTC welcomes feedback on how to improve the standard in the rNews Forum.

I am sure there are others.

Although I rather like stuff as an alternative to SUMO’s thing or was that Cyc?

The point being that mapping strategies, when the expense can be justified, are the “answer” to the semantic diversity and richness of human discourse.

September 2, 2011

Improving the recall of decentralised linked data querying through implicit knowledge

Filed under: Linked Data,LOD,SPARQL — Patrick Durusau @ 8:02 pm

Improving the recall of decentralised linked data querying through implicit knowledge by Jürgen Umbrich, Aidan Hogan, Axel and Polleres.

Abstract:

Aside from crawling, indexing, and querying RDF data centrally, Linked Data principles allow for processing SPARQL queries on-the-fly by dereferencing URIs. Proposed link-traversal query approaches for Linked Data have the benefits of up-to-date results and decentralised (i.e., client-side) execution, but operate on incomplete knowledge available in dereferenced documents, thus affecting recall. In this paper, we investigate how implicit knowledge – specifically that found through owl:sameAs and RDFS reasoning – can improve the recall in this setting. We start with an empirical analysis of a large crawl featuring 4 m Linked Data sources and 1.1 g quadruples: we (1) measure expected recall by only considering dereferenceable information, (2) measure the improvement in recall given by considering rdfs:seeAlso links as previous proposals did. We further propose and measure the impact of additionally considering (3) owl:sameAs links, and (4) applying lightweight RDFS reasoning (specifically {\rho}DF) for finding more results, relying on static schema information. We evaluate our methods for live queries over our crawl.

From the document:

owl:sameAs links are used to expand the set of query relevant sources, and owl:sameAs rules are used to materialise implicit knowledge given by the OWL semantics, potentially generating additional answers.

I have always thought that knowing the “why” an owl:sameAs would make it more powerful. But since any basis for subject sameness can be used, that may not be the case.

August 30, 2011

LC Name Authority File Available as Linked Data

Filed under: Law - Sources,Legal Informatics,Linked Data — Patrick Durusau @ 7:13 pm

LC Name Authority File Available as Linked Data

From Legal Informatics Blog:

The Library of Congress has made available the LC Name Authority File as Linked Data.

The data are available in several formats, including RDF/XML, N-Triples, and JSON.

Of particular interest to the legal informatics community is the fact that the Linked Data version of the LC Name Authority File includes records for names of very large numbers of government entities — as well as of other kinds of organizations, such as corporations, and individuals — of the U.S., Canada, the U.K., France, India, and many other nations. The file also includes many records for individual statutes.

Interesting post that focuses on law related authority records.

August 25, 2011

SERIMI…. (Have you washed your data?)

Filed under: Linked Data,LOD,RDF,Similarity — Patrick Durusau @ 7:04 pm

SERIMI – Resource Description Similarity, RDF Instance Matching and Interlinking

From the website:

The interlinking of datasets published in the Linked Data Cloud is a challenging problem and a key factor for the success of the Semantic Web. Manual rule-based methods are the most effective solution for the problem, but they require skilled human data publishers going through a laborious, error prone and time-consuming process for manually describing rules mapping instances between two datasets. Thus, an automatic approach for solving this problem is more than welcome. We propose a novel interlinking method, SERIMI, for solving this problem automatically. SERIMI matches instances between a source and a target datasets, without prior knowledge of the data, domain or schema of these datasets. Experiments conducted with benchmark collections demonstrate that our approach considerably outperforms state-of-the-art automatic approaches for solving the interlinking problem on the Linked Data Cloud.

SERIMI-TECH-REPORT-v2.pdf

From the Results section:

The poor performance of SERIMI in the Restaurant1-Reataurant2 is mainly due to missing alignment in the reference set. The poor performance in the Person21-Person22 pair is due to the nature of the data. These datasets where built by adding spelling mistakes to the properties and literals values of their original datasets. Also only instances of class Person were retrieved into the pseudo-homonym sets during the interlinking process.

Impressive work overall but isn’t dirty data really the test? Just about any process can succeed with clean data.

Or is that really the weakness of the Semantic Web? That it requires clean data?

August 22, 2011

Public Dataset Catalogs Faceted Browser

Filed under: Dataset,Facets,Linked Data,RDF — Patrick Durusau @ 7:42 pm

Public Dataset Catalogs Faceted Browser

A faceted browser for the catalogs, not their content.

Filter on coverage, location, country (not sure how location and country usefully differ), catalog status (seems to mix status and data type), and managed by.

Do be aware that as the little green balloons disappear with your selection that more of the coloring of the map itself appears.

I mention that because at first it seemed the map was being colored based on the facets I choose. Such as Europe is suddenly dark green when I chose the United States in the filter. Confusing at first and makes me wonder, why use a map with underlying coloration anyway? A white map with borders would be a better display background for the green balloons indicating catalog locations.

BTW, if you visit a catalog and then use the back button, all your filters are reset. Not a problem now with a small set of filters and only 100 catalogs but should this resource continue to grow, that could become a usability issue.

August 20, 2011

Linked Data Patterns – New Draft

Filed under: Linked Data,Semantic Web — Patrick Durusau @ 8:04 pm

Linked Data Patterns – New Draft – Leigh Dobbs and Ian Davis have released a new draft.

From the website:

A pattern catalogue for modelling, publishing, and consuming Linked Data.

Think of it as Linked Data without all the Put your hand on your computer and feel the power of URI stuff you hear in some quarters.

For example, the solution for for How do we publish non-global identifiers in RDF? is:

Create a custom property, as a sub-class of the dc:identifier property for relating the existing literal key value with the resource.

And the discussion reads:

While hackable URIs are a useful short-cut they don’t address all common circumstances. For example different departments within an organization may have different non-global identifiers for a resource; or the process and format for those identifiers may change over time. The ability to algorithmically derive a URI is useful but limiting in a global sense as knowledge of the algorithm has to be published separately to the data.

By publishing the original “raw” identifier as a literal property of the resource we allow systems to look-up the URI for the associated resource using a simple SPARQL query. If multiple identifiers have been created for a resource, or additional identifiers assigned over time, then these can be added as additional repeated properties.

For systems that may need to bridge between the Linked Data and non-Linked Data views of the world, e.g. integrating with legacy applications and databases that do not store the URI, then the ability to find the identifier for the resource provides a useful integration step.

If I aggregate the non-Linked Data identifiers as sub-classes of dc:identifier, isn’t that a useful integration step whether I am using Linked Data or not?

The act of aggregating identifiers is a useful integration step, by whatever syntax. Yes?

My principal disagreement with Linked Data and other “universal” identification systems is that none of them are truly universal or long lasting. Rhetoric to the contrary notwithstanding.

August 10, 2011

LOD cloud diagram – Next Version

Filed under: Linked Data,LOD,Semantic Web — Patrick Durusau @ 7:17 pm

Anja Jentsch posted the following call on the public-lod@w3.org list:

we would like to thank you for putting so much effort in curating the CKAN packages for Linked Data sets since our last call.

We have compiled statistics for the 256 data sets[1] on CKAN that will be included in the next LOD Cloud: http://lod-cloud.net/state

Altogether 446 data sets are currently tagged on CKAN as LOD [2]. But the description of many of these data sets is still incomplete so that we can not find out whether they fulfil the minimal requirements for being included into the LOD cloud diagram (dereferencable URIs and RDF links to or from other data sources).

A list of data sets that could not include yet and an explanation of what is missing can be found here: http://www4.wiwiss.fu-berlin.de/lodcloud/ckan/validator/

Starting next week we will generate the next LOD cloud diagram [3].

Therefore we would like to invite those of you who publish data sets that we could not include yet to please review and update your entries. Please finalize your dataset descriptions until August 15th to ensure that your data set will be part of the LOD Cloud.

In order to aid you in this quest, we have provided a validation page for your CKAN entry with step-by-step guidance for the information needed:
http://www4.wiwiss.fu-berlin.de/lodcloud/ckan/validator/

You can use the CKAN entry for DBpedia as an example:
http://ckan.net/package/dbpedia

Thank you for helping!

Cheers,
Anja, Chris and Richard

[1] http://ckan.net/package/search?q=groups:lodcloud+AND+-tags:lodcloud.unconnected+AND+-tags:lodcloud.needsfixing
[2] http://ckan.net/tag/lod
[3] http://lod-cloud.net/

Just a reminder, today is the 10th of August so don’t wait to review your entry.

Whatever your approach, we all benefit from cleaner data.

August 1, 2011

International Bibliographic Standards, Linked Data, and the Impact on Library Cataloging

Filed under: Conferences,FRBR,Linked Data — Patrick Durusau @ 3:49 pm

International Bibliographic Standards, Linked Data, and the Impact on Library Cataloging

Webinar
August 24, 2011
1:00 – 2:30 p.m. (Eastern Time)

From the notice:

The International Federation of Library Associations and Institutions (IFLA) is responsible for the development and maintenance of International Standard Bibliographic Description (ISBD), UNIMARC, and the “Functional Requirements” family for bibliographic records (FRBR), authority data (FRAD), and subject authority data (FRSAD). ISBD underpins the MARC family of formats used by libraries world-wide for many millions of catalog records, while FRBR is a relatively new model optimized for users and the digital environment. These metadata models, schemas, and content rules are now being expressed in the Resource Description Framework language for use in the Semantic Web.

This webinar provides a general update on the work being undertaken. It describes the development of an Application Profile for ISBD to specify the sequence, repeatability, and mandatory status of its elements. It discusses issues involved in deriving linked data from legacy catalogue records based on monolithic and multi-part schemas following ISBD and FRBR, such as the duplication which arises from copy cataloging and FRBRization. The webinar provides practical examples of deriving high-quality linked data from the vast numbers of records created by libraries, and demonstrates how a shift of focus from records to linked-data triples can provide more efficient and effective user-centered resource discovery services.

This not a free webinar but registration means if you miss it on the 24th of August, you will still have access to the recorded proceedings for one year.

July 7, 2011

JSON-LD – Expressing Linked Data in JSON

Filed under: JSON,Linked Data — Patrick Durusau @ 4:29 pm

JSON-LD – Expressing Linked Data in JSON

I mentioned recently a mailing list on Linked Data in JSON.

From the webpage:

JSON-LD (JavaScript Object Notation for Linked Data) is a lightweight Linked Data format. It is easy for humans to read and write. It is easy for machines to parse and generate. It is based on the already successful JSON format and provides a way to help JSON data interoperate at Web-scale. If you are already familiar with JSON, writing JSON-LD is very easy. There is a smooth migration path from the JSON you use today, to the JSON-LD you will use in the future. These properties make JSON-LD an ideal Linked Data interchange language for JavaScript environments, Web services, and unstructured databases such as CouchDB and MongoDB.

Short example or two plus links to other resources.

July 6, 2011

A Survey On Data Interlinking Methods

Filed under: Linked Data,LOD,RDF — Patrick Durusau @ 2:11 pm

A Survey On Data Interlinking Methods by Stephan Wölger, Katharina Siorpaes, Tobias Bürger, Elena Simperl, Stefan Thaler, and, Christian Hofer.

From the introduction:

In 2007 the Linking Open Data (LOD) community project started an initiative which aims at increased use of Semantic Web applications. Such applications on the one hand provide new means to enrich a user’s web experience but on the other hand also require certain standards to be adhered to. Two important requirements when it comes to Semantic Web applications are the availability of RDF datasets on the web and having typed links between these datasets in order to be able to browse the data and to jump between them in various directions.

While there exist tools that create RDF output automatically from the application level and tools that create RDF from web sites, interlinking the resulting datasets is still a task that can be cumbersome for humans (either because there is a lack of insentives or due the non-availability of user friendly tools) or not doable for machines (due to the manifoldness of domains). Despite the fact that there are more and more interlinking tools available, those either can be applied only for certain domains of the real world (e.g. publications) or they can be used just for interlinking a specific type of data (e.g. multimedia data).

Another interesting survey article from the Semantic Technology Institute (STI) Innsbruck, University of Innsbruck.

I like the phrase “…manifoldness of domains.” RDF output is useful information about data. The problem I foresee is that the semantics it represents are local, hence the “manifoldness of domains.” Not always, there are some domains that are so close as to not be distinguishable, one from the other, and linking RDF will work quite well.

One imagines that RDF based interlinking OfficeDepot, Staples and OfficeMax should not be difficult. Tiresome, not terribly interesting, but not difficult. And that could prove to be useful for personal and corporate buyers seeking price breaks or competitors trying to decide on loss leaders. Not a lot of reasoning to be done except by the buyers and sellers.

I am sure there would still be some domain differences between those vendors but having a common mapping from one vendor number to all three vendor numbers could prove to be very useful for customers and distributors alike.

For more complex/abstract domains, where “…manifoldness of domains.” is an issue, you can use topic maps.

June 29, 2011

Providing and discovering definitions of URIs

Filed under: Identifiers,Linked Data,LOD,OWL,RDF,Semantic Web — Patrick Durusau @ 9:10 am

Providing and discovering definitions of URIs by Jonathan A. Rees.

Abstract:

The specification governing Uniform Resource Identifiers (URIs) [rfc3986] allows URIs to mean anything at all, and this unbounded flexibility is exploited in a variety contexts, notably the Semantic Web and Linked Data. To use a URI to mean something, an agent (a) selects a URI, (b) provides a definition of the URI in a manner that permits discovery by agents who encounter the URI, and (c) uses the URI. Subsequently other agents may not only understand the URI (by discovering and consulting the definition) but may also use the URI themselves.

A few widely known methods are in use to help agents provide and discover URI definitions, including RDF fragment identifier resolution and the HTTP 303 redirect. Difficulties in using these methods have led to a search for new methods that are easier to deploy, and perform better, than the established ones. However, some of the proposed methods introduce new problems, such as incompatible changes to the way metadata is written. This report brings together in one place information on current and proposed practices, with analysis of benefits and shortcomings of each.

The purpose of this report is not to make recommendations but rather to initiate a discussion that might lead to consensus on the use of current and/or new methods.

The criteria for success:

  1. Simple. Having too many options or too many things to remember makes discovery fragile and impedes uptake.
  2. Easy to deploy on Web hosting services. Uptake of linked data depends on the technology being accessible to as many Web publishers as possible, so should not require control over Web server behavior that is not provided by typical hosting services.
  3. Easy to deploy using existing Web client stacks. Discovery should employ a widely deployed network protocol in order to avoid the need to deploy new protocol stacks.
  4. Efficient. Accessing a definition should require at most one network round trip, and definitions should be cacheable.
  5. Browser-friendly. It should be possible to configure a URI that has a discoverable definition so that ‘browsing’ to it yields information useful to a human.
  6. Compatible with Web architecture. A URI should have a single agreed meaning globally, whether it’s used as a protocol element, hyperlink, or name.

.

I had to look it up to get the page number but I remembered Karl Wiegers in Software Requirements saying:

Feasible

It must be possible to implement each requirement within the known capabilities and limitations of the system and its environment.

The single agreed meaning globally, whether it’s used as a protocol element, hyperlink, or name requirement is not feasible. It will stymie this project, despite the array of talent on hand, until it is no longer a requirement.

Need proof? Name one URI with a single agreed meaning globally, whether it’s used as a protocol element, hyperlink, or name.

Not one that the W3C TAG, or TBL or anyone else thinks/wants/prays has a single agree meaning globally, … but one that in fact has such a global meaning.

It’s been more than ten years. Let’s drop the last requirement and let the rather talented group working on this come up with a solution that meets the other five (5) requirements.

It won’t be a universal solution but then neither is the WWW.

R2R Framework

Filed under: LDIF,Linked Data,R2R — Patrick Durusau @ 9:03 am

R2R Framework

The R2R Framework is used by the LDIF – Linked Data Integration Framework.

The R2R User Manual contains the specification and will likely be of more use than the website.

LDIF – Linked Data Integration Framework

Filed under: LDIF,Linked Data,LOD,RDF,Semantic Web — Patrick Durusau @ 9:02 am

LDIF – Linked Data Integration Framework 0.1

From the webpage:

The Web of Linked Data grows rapidly and contains data from a wide range of different domains, including life science data, geographic data, government data, library and media data, as well as cross-domain datasets such as DBpedia or Freebase. Linked Data applications that want to consume data from this global data space face the challenges that:

  1. data sources use a wide range of different RDF vocabularies to represent data about the same type of entity.
  2. the same real-world entity, for instance a person or a place, is identified with different URIs within different data sources.

This usage of different vocabularies as well as the usage of URI aliases makes it very cumbersome for an application developer to write SPARQL queries against Web data which originates from multiple sources. In order to ease using Web data in the application context, it is thus advisable to translate data to a single target vocabulary (vocabulary mapping) and to replace URI aliases with a single target URI on the client side (identity resolution), before starting to ask SPARQL queries against the data.

Up-till-now, there have not been any integrated tools that help application developers with these tasks. With LDIF, we try to fill this gap and provide an initial alpha version of an open-source Linked Data Integration Framework that can be used by Linked Data applications to translate Web data and normalize URI aliases.

More comments will follow, but…

Isn’t this the reverse of the well-known synonym table in IR?

Instead of substituting synonyms in the query expression, the underlying data is being transformed to produce…a lack of synonyms?

No, not the reverse of a synonym table, in synonym table terms, we would lose the synonym table and transform the underlying textual data to use only a single term where before there were N terms, all of which occurred in the synonym table.

If I search for a term previously listed in the synonym table, but one replaced by a common term, my search result will be empty.

No more synonyms? That sounds like a bad plan to me.

June 23, 2011

Linked Data in Linguistics March 7 – 9, 2012, Frankfurt/Main, Germany

Filed under: Conferences,Linguistics,Linked Data — Patrick Durusau @ 1:53 pm

Linked Data in Linguistics March 7 – 9, 2012, Frankfurt/Main, Germany

Important Dates:

August 7, 2011: Deadline for extended abstracts (four pages plus references)
September 9, 2011: Notification of acceptance
October 23, 2011: One-page abstract for DGfS conference proceedings
December 1, 2011: Camera-ready papers for workshop proceedings (eight pages plus references)
March 7-9, 2012: Workshop
March 6-9, 2012: Conference

From the website:

The explosion of information technology has led to a substantial growth in quantity, diversity and complexity of web-accessible linguistic data. These resources become even more useful when linked. This workshop will present principles, use cases, and best practices for using the linked data paradigm to represent, exploit, store, and connect different types of linguistic data collections.

Recent relevant developments include: (1) Language archives for language documentation, with audio, video, and text transcripts from hundreds of (endangered) languages (e.g. Dobes). (2) Typological databases with typological and geographical data about languages from all parts of the globe (e.g. WALS). (3) Development, distribution and application of lexical-semantic resources (LSRs) in NLP (e.g. WordNet). (4) Multi-layer annotations (e.g. ISO TC37/SC4) and semantic annotation of corpora (e.g. PropBank) by corpus linguists and computational linguists, often accompanied by the interlinking of corpora with LSRs (e.g. OntoNotes).

The general trend of providing data online is accompanied by newly developing possibilities to link linguistic data and metadata. This may include general data sources (e.g. DBpedia.org), but also repositories with specific linguistic information about languages (Multitree.org, LL-MAP, ISO 639-3), as well as about linguistic categories and phenomena (GOLD, ISOcat).

Originally noticed this from a tweet by Lutz Maicher.

June 6, 2011

2nd Workshop on the Multilingual Semantic Web

Filed under: Conferences,Cross-lingual,Linked Data,Multilingual — Patrick Durusau @ 1:58 pm

2nd Workshop on the Multilingual Semantic Web

Collocated with the 10th International Semantic Web Conference (ISWC2011) in Bonn, Germany.

Important Dates

August 15th – submission deadline
September 5th – notification
September 10th – camera-ready deadline
October 23th or 24th – workshop

Abstract:

Given the substantial growth of Web users that create and update knowledge all over the world in languages other than English, multilingualism has become an issue of major interest for the Semantic Web community. This process has been accelerated due to initiatives such as the Linked Data project, which encourages not only governments and public institutes to make their data available to the public, but also private organizations in domains such as medicine, geography, music etc. These actors often publish their data sources in their respective languages, and as such, in order to make this information interoperable and accessible to members of other linguistic communities, multilingual knowledge representation, access and translation are an impending need.

Items of special focus:

  • representation of multilingual information and language resources in Semantic Web and Linked Data formats
  • cross-lingual discovery and representation of mappings between multilingual Linked Data vocabularies and datasets
  • cross-lingual querying of knowledge repositories and Linked Data
  • machine translation and localization strategies for the Semantic Web

The first three are classic topic map fare and the last one isn’t that much of a reach.

June 2, 2011

Second International Workshop on Consuming Linked Data (COLD2011)

Filed under: Linked Data,LOD — Patrick Durusau @ 7:42 pm

Second International Workshop on Consuming Linked Data (COLD2011)

Important Dates:

Paper submission deadline: August 15, 2011, 23.59 Hawaii time
Acceptance notification: September 6, 2011
Camera-ready versions of accepted papers: September 15, 2011
Workshop date: October 23 or 24, 2011

From the website:

Abstract:

The quantity of published Linked Data is increasing dramatically. However, applications that consume Linked Data are not yet widespread. Current approaches lack methods for seamless integration of Linked Data from multiple sources, dynamic discovery of available data and data sources, provenance and information quality assessment, application development environments, and appropriate end user interfaces. Addressing these issues requires well-founded research, including the development and investigation of concepts that can be applied in systems which consume Linked Data from the Web. Following the success of the 1st International Workshop on Consuming Linked Data, we organize the second edition of this workshop in order to provide a platform for discussion and work on these open research problems. The main objective is to provide a venue for scientific discourse -including systematic analysis and rigorous evaluation- of concepts, algorithms and approaches for consuming Linked Data.

Err “…lack methods for seamless integration of Linked Data from multiple sources…” has topic maps written all over it.

June 1, 2011

Silk – A Link Discovery Framework for the Web of Data

Filed under: Linked Data,LOD,RDF,Semantic Web — Patrick Durusau @ 6:52 pm

Silk – A Link Discovery Framework for the Web of Data

From the website:

The Web of Data is built upon two simple ideas: First, to employ the RDF data model to publish structured data on the Web. Second, to set explicit RDF links between data items within different data sources. Background information about the Web of Data is found at the wiki pages of the W3C Linking Open Data community effort, in the overview article Linked Data – The Story So Far and in the tutorial on How to publish Linked Data on the Web.

The Silk Link Discovery Framework supports data publishers in accomplishing the second task. Using the declarative Silk – Link Specification Language (Silk-LSL), developers can specify which types of RDF links should be discovered between data sources as well as which conditions data items must fulfill in order to be interlinked. These link conditions may combine various similarity metrics and can take the graph around a data item into account, which is addressed using an RDF path language. Silk accesses the data sources that should be interlinked via the SPARQL protocol and can thus be used against local as well as remote SPARQL endpoints.

Of particular interest are the comparison operators:

A comparison operator evaluates two inputs and computes their similarity based on a user-defined metric.
The Silk framework currently supports the following similarity metrics, which return a similarity value between 0 (lowest similarity) and 1 (highest similarity) each:

Metric Description
levenshtein([float maxDistance], [float minValue], [float maxValue]) String similarity based on the Levenshtein metric.
jaro String similarity based on the Jaro distance metric.
jaroWinkler String similarity based on the Jaro-Winkler metric.
qGrams(int q) String similarity based on q-grams (by default q=2).
equality Return 1 if strings are equal, 0 otherwise.
inequality Return 0 if strings are equal, 1 otherwise.
num(float maxDistance, float minValue, float maxValue) Computes the numeric distance between two numbers and normalizes it using the threshold.
Parameters:
maxDistance The similarity score is 0.0 if the distance is bigger than maxDistance.
minValue, maxValue The minimum and maximum values which occur in the datasource
date(int maxDays) Computes the similarity between two dates (“YYYY-MM-DD” format). At a difference of “maxDays”, the metric evaluates to 0 and progresses towards 1 with a lower difference.
wgs84(string unit, float threshold, string curveStyle) Computes the geographical distance between two points.
Parameters:
unit The unit in which the distance is measured. Allowed values: “meter” or “m” (default) , “kilometer” or “km”

threshold Will result in a 0 for all bigger values than t, values below are varying with the curveStyle
curveStyle “linear” gives a linear transition, “logistic” uses the logistic function f(x)=1/(1+e^(x)) gives a more soft curve with a slow slope at the start and the end of the curve but a steep one in the middle.
Author: Konrad Höffner (MOLE subgroup of Research Group AKSW, University of Leipzig)

(better formatting is available at the original page but I thought the operators important enough to report in full here)

Definitely a step towards more than opaque mapping between links. Note for example that Silk – Link Specification Language declares why two or more links are mapped together. More could be said but this is a start in the right direction.

Elsevier/Tetherless World Health and Life Sciences Hackathon (27-28 June 2011)

Filed under: Conferences,Linked Data — Patrick Durusau @ 6:50 pm

Elsevier/Tetherless World Health and Life Sciences Hackathon (27-28 June 2011)

From the announcement:

The Tetherless World Constellation at RPI is excited to announce that TWC and the SciVerse team at Elsevier are planning a 24-hour Health and Life Sciences Semantic Web Hackathon to be held 27-28 June 2011. The Elsevier-sponsored event will be held at the beautiful Pat’s Barn, on the campus of the Rensselaer Technology Park.

Participants will compete against each other to develop apps using linked data from TWC and other sources, web APIs from Elsevier SciVerse, and visualization and other resources from around the Web.

Registration at: http://twcsciverse2011.eventbrite.com/

You won’t see much other than Pat’s Barn but it is a 24-hour hackathon and there are prizes!

Using topic maps to make linked data links less semantically opaque comes to mind.

May 24, 2011

Music Linked Data Workshop (JISC, London, 12 May 2011)

Filed under: Linked Data,Music Retrieval — Patrick Durusau @ 10:24 am

Slides from the Music Linked Data Workshop (JISC, London, 12 May 2011)

Here you will find:

  • MusicNet: Aligning Musicology’s Metadata – David Bretherton, Daniel Alexander Smith, Joe Lambert and mc schraefel (Music, and Electronics and Computer Science, University of Southampton)
  • Towards Web-Scale Analysis of Musical Structure – J. Stephen Downie (Graduate School of Library and Information Science, University of Illinois at Urbana-Champaign), David De Roure (Oxford e-Research Centre, University of Oxford) and Kevin Page (Oxford e-Research Centre, University of Oxford)
  • LinkedBrainz Live – Simon Dixon, Cedric Mesnage and Barry Norton (Centre for Digital Music, Queen Mary University of London)
  • BBC Music – Using the Web as our Content Management System – Nicholas Humfrey (BBC)
  • Early Music Online: Opening up the British Library’s 16th-Century Music Books – Sandra Tuppen (British Library)
  • Musonto – A Semantic Search Engine Dedicated to Music and Musicians – Jean-Philippe Fauconnier (Université Catholique de Louvain, Belgium) and Joseph Roumier (CETIC, Belgium)
  • Listening to Movies – Creating a User-Centred Catalogue of Music for Films – Charlie Inskip (freelance music consultant)

Look like good candidates for the further review inbox!

May 20, 2011

Seevl

Filed under: Dataset,Interface Research/Design,Linked Data,Music Retrieval,Semantic Web — Patrick Durusau @ 4:04 pm

Seevl: Reinventing Music Discovery

If you are interested in music or interfaces, this is a must stop location!

Simple search box.

I tried searching for artists, albums, types of music.

In addition to search results you also get suggestions of related information.

The Why is this related? link for related information was particularly interesting. It offers a “why” additional information was offered for a particular search result.

Developers can access their data for non-commercial uses for free.

The simplicity of the interface was a real plus.

May 18, 2011

Datalift

Filed under: Dataset,Linked Data,Semantic Web — Patrick Durusau @ 6:42 pm

Datalift (also available in French)

From the webpage:

Datalift brings raw structured data coming from various formats (relational databases, CSV, XML, …) to semantic data interlinked on the Web of Data.

Datalift is an experimental research project funded by the French national research agency. Its goal is to develop a platform to publish and interlink datasets on the Web of data. Datalift will both publish datasets coming from a network of partners and data providers and propose a set of tools for easing the datasets publication process.

A few steps to data heaven

The project will provide tools allowing to facilitate each step of the publication process:

  • selecting ontologies for publishing data
  • converting data to the appropriate format (RDF using the selected ontology)
  • publishing the linked data
  • interlinking data with other data sources

The project is funded for three years so it needs to hit the ground on the run.

I am sure they would appreciate useful feedback.

May 10, 2011

Special Issue on Linked Data for Science and Education

Filed under: Linked Data,LOD — Patrick Durusau @ 3:29 pm

Special Issue on Linked Data for Science and Education

The Semantic Web Journal has posted a call for papers on linked data for science and education.

Important dates:

Deadline for submissions: May 31 2011
Reviews due: July 15 2011
Final versions of accepted papers due: August 12 2011

Apologies, I missed this announcement when it came out in early February, 2011.

From the call:

The number of universities, research organizations, publishers and funding agencies contributing to the Linked Data cloud is constantly increasing. The Linked Data paradigm has been identified as a lightweight approach for data dissemination and integration, opening up new opportunities for the organization, integration, archiving and retrieval of research results and educational material. Obviously, this novel approach also raises new challenges regarding the integrity, adoption, use and sustainability of contents. A number of case studies from universities and research communities already demonstrate that Linked Data is not merely a novel way of exposing data on the Web, but that its principles help integrating related data, connecting scientists working on related topics, and improving scientific and educational workflows. The next challenges in creating a true Web of scientific and educational data include dealing with provenance, mapping vocabularies (i.e., ontologies), and organizational issues such as assessing costs and ensuring persistence and performance. In this special issue of the Semantic Web Journal, we want to collect the state of the art in Linked Data for science and education and identify upcoming challenges, focusing on technological aspects as well as social and legal implications.

Well, I like that:

The next challenges in creating a true Web of scientific and educational data include dealing with provenance, mapping vocabularies (i.e., ontologies), and organizational issues such as assessing costs and ensuring persistence and performance.

Link data together and then hope we can sort it out on the other end.

Doesn’t that sound a lot like Google?

Index data together and then hope we can sort it out on the other end.

May 8, 2011

Linked Data in JSON

Filed under: JSON,Linked Data — Patrick Durusau @ 6:15 pm

A mailing list has been created for Linked Data in JSON.

Manu Spomy has posted Updated JSON-LD Draft with a summary of changes and links for those already familiar with the draft.

You will be encountering it so it will be helpful to follow the discussion.

May 3, 2011

PoolParty

Filed under: Linked Data,Semantic Web,SKOS,Thesaurus — Patrick Durusau @ 1:07 pm

PoolParty

From the website:

PoolParty is a thesaurus management system and a SKOS editor for the Semantic Web including text mining and linked data capabilities. The system helps to build and maintain multilingual thesauri providing an easy-to-use interface. PoolParty server provides semantic services to integrate semantic search or recommender systems into enterprise systems like CMS, web shops, CRM or Wikis.

I encountered PoolParty in the video Pool Party – Semantic Search.

The video elides over a lot of difficulties but what effective advertising doesn’t?

Curious if anyone is familiar with this group/product?


Update: 31 May 2011

Slides: Pool Party – Semantic Search

Nice slide deck on semantic search issues.

April 28, 2011

Dataset linkage recommendation on the Web of Data

Filed under: Conferences,Entity Resolution,Linked Data,LOD — Patrick Durusau @ 3:18 pm

Dataset linkage recommendation on the Web of Data by Martijn van der Plaat (Master thesis).

Abstract:

We address the problem of, given a particular dataset, which candidate dataset(s) from the Web of Data have the highest chance of holding co-references, in order to increase the efficiency of coreference resolution. Currently, data publishers manually discover and select the right dataset to perform a co-reference resolution. However, in the near future the size of the Web of Data will be such that data publishers can no longer determine which datasets are candidate to map to. A solution for this problem is finding a method to automatically recommend a list of candidate datasets from the Web of Data and present this to the data publisher as an input for the mapping.

We proposed two solutions to perform the dataset linkage recommendation. The general idea behind our solutions is predicting the chance a particular dataset on the Web of Data holds co-references with respect to the dataset from the data publisher. This prediction is done by generating a profile for each dataset from the Web of Data. A profile is meta-data that represents the structure of a dataset, in terms of used vocabularies, class types, and property types. Subsequently, dataset profiles that correspond with the dataset profile from the data publisher, get a specific weight value. Datasets with the highest weight values have the highest chance of holding co-references.

A useful exercise but what happens when data sets have inconsistent profiles from different sources?

And for all the drum banging, only a very tiny portion of all available datasets are part of Linked Data.

How do we evaluate the scalability of such a profiling technique?

March 23, 2011

Linked Data: Evolving the Web into a Global Data Space (The Online Book)

Filed under: Linked Data,RDF,Topic Maps — Patrick Durusau @ 6:01 am

Linked Data: Evolving the Web into a Global Data Space (The Online Book)

The Principles of Linked Data:

1. Use URIs as names for things.
2. Use HTTP URIs, so that people can look up those names.
3. When someone looks up a URI, provide useful information, using the standards (RDF, SPARQL).
4. Include links to other URIs, so that they can discover more things.

First observation/question

What made the original WWW proposal different from all hypertext systems before it?

There had been a number of hypertext systems before the WWW, some with capabilities that the WWW continues to lack.

But what make it different or perhaps better, successful?

That links could fail?

Oh, but if we are going to have a global data space that identifies stuff, it can’t fail. Yes?

So, we are taking a flexible, fault tolerant system (the World Wide Web) and making it into an inflexible, brittle system (the Semantic Web).

That sounds like a very, very bad plan.

Second observation/question

Global data space?

Even allowing for marketing puff, that is a bit of a stretch. Well, more than that, it is an outright lie.

Consider all the data that is now being collected by the Large Hadron Collider in the CERN. So much data that data has to be discarded. Simply can’t keep it all.

Or all the data from previous space missions and astronomical observations, both visible and in other bands.

Or all the legal (and one assumes illegal) records of government activity.

Or all the other information, records, data from human activity.

And not just the documents, but the stuff people talk about in them and the relationships between the things they talk about.

Some of that can be addressed or obtained over the web, but that isn’t the same thing as identifying all the stuff talked about in that material on the WWW.

Now, if Linked Data wanted to claim that the WWW was a global data space for information of interest to a particular group, well, that comes closer to being believable at least.

*****

However silly a single, unifying data model may sound, it is true that making data more accessible, by any means, makes it easier to make sensible use of it.

Despite having a drank the Kool-Aid perspective on linked data, this book is a useful introduction to it as a technology.

Ignore the “…put your hand on the radio and feel the power…” type stuff.

Keep saying to yourself: “it’s just another format, it’s just another format…,” and you will be fine.

March 10, 2011

Pentaho BI Suite Enterprise Edition (TM/SW Are You Listening?)

Filed under: BI,Linked Data,Marketing,Semantic Web — Patrick Durusau @ 8:12 am

Pentaho BI Suite Enterprise Edition

From the website:

Pentaho is the open source business intelligence leader. Thousands of organizations globally depend on Pentaho to make faster and better business decisions that positively impact their bottom lines. Download the Pentaho BI Suite today if you want to speed your BI development, deploy on-premise or in the cloud or cut BI licensing costs by up to 90%.

There are several open source offerings like this, Talend is another one that comes to mind.

I haven’t looked at its data integration in detail but suspect I know the answer to the question:

Say I have an integration of some BI assets using Pentaho and other BI assets integrated using Talend, how do I integrate those together while maintaining the separately integrated BI assets?

Or for that matter, how do I integrate BI that has been gathered and integrated by others, say Lexis/Nexis?

Interesting too to note that this is the sort of user slickness and ease that topic maps and (cough) linked data (see, I knew I could say it), faces in the marketplace.

Does it offer all the bells and whistles of more sophisticated subject identity or reasoning approaches?

No, but if it offers all that users are interested in using, what is your complaint?

Both topic maps and semantic web/linked data approaches need to listen more closely to what users want.

As opposed to deciding what users need.

And delivering the latter instead of the former.

February 18, 2011

Linked Data: Evolving the Web into a Global Data Space – Book

Filed under: Linked Data — Patrick Durusau @ 5:19 am

Linked Data: Evolving the Web into a Global Data Space by Tom Heath and Christian Bizer.

Abstract:

The World Wide Web has enabled the creation of a global information space comprising linked documents. As the Web becomes ever more enmeshed with our daily lives, there is a growing desire for direct access to raw data not currently available on the Web or bound up in hypertext documents. Linked Data provides a publishing paradigm in which not only documents, but also data, can be a first class citizen of the Web, thereby enabling the extension of the Web with a global data space based on open standards – the Web of Data. In this Synthesis lecture we provide readers with a detailed technical introduction to Linked Data. We begin by outlining the basic principles of Linked Data, including coverage of relevant aspects of Web architecture. The remainder of the text is based around two main themes – the publication and consumption of Linked Data. Drawing on a practical Linked Data scenario, we provide guidance and best practices on: architectural approaches to publishing Linked Data; choosing URIs and vocabularies to identify and describe resources; deciding what data to return in a description of a resource on the Web; methods and frameworks for automated linking of data sets; and testing and debugging approaches for Linked Data deployments. We give an overview of existing Linked Data applications and then examine the architectures that are used to consume Linked Data from the Web, alongside existing tools and frameworks that enable these. Readers can expect to gain a rich technical understanding of Linked Data fundamentals, as the basis for application development, research or further study.

A free HTML version is reported to be due out 1 March 2011.

Unless I am seriously mistaken (definitely a possibility), all our data and structures that hold our data, already have semantics, but thanks for asking.

To enable semantic integration we need to make those semantics explicit but that hardly requires conversion into Linked Data format.

Any more than Linked Data format enables more linking than it takes away.

As a matter of fact it takes away a lot of linking, at least if you follow its logic, because linked data can only link to other linked data. How unfortunate.

The other question I will have to ask, after a decent period following the appearance of the book, is what about the data structures of Linked Data? Do they also qualify as first class citizens of the Web?

Linked Data-a-thon – ISWC 2011

Filed under: Conferences,Linked Data,Marketing,Semantic Web — Patrick Durusau @ 5:17 am

Linked Data-a-thon http://iswc2011.semanticweb.org/calls/linked-data-a-thon/

I looked at the requirements for the Linked Data-a-thon, which include:

  • make use of Linked Data consumed from multiple data sources
  • be able to make use of additional data from other Linked Data sources
  • be accessible from the Web
  • satisfy the special requirement which will be announced on October 1, 2011.

It would not be hard to fashion a topic map application that consumed Linked Data, made use of additional data from other Linked Data sources and was accessible from the Web.

What would be interesting would be to reliably integrate other information sources, that were not Linked Data with Linked Data sources.

Don’t know about the special requirement.

One person in a team of people would actually have to be attending the conference to enter.

Anyone interested in discussing such a entry?

Suggested Team title: Linked Data Cake (1 Tsp Linked Data, 8 Cups Non-Linked Data, TM Oven – Set to Merge)

Kinda long and pushy but why not?

What better marketing pitch for topic maps than to leverage present investments in Linked Data into a meaningful result with non-linked data.

It isn’t like there is a shortage of non-linked data to choose from. 😉

DataLift

Filed under: Dataset,Linked Data,Ontology,RDF — Patrick Durusau @ 5:12 am

DataLift

The DataLift project will no doubt produce some useful tools and output but reading its self-description:

The project will provide tools allowing to facilitate each step of the publication process:

  1. selecting ontologies for publishing data
  2. converting data to the appropriate format (RDF using the selected ontology)
  3. publishing the linked data
  4. interlinking data with other data sources

I am struck by how futile the effort sounds in the face of petabytes of data flow, changing semantics of that data and changing semantics of other data, with which it might be interlinked.

The nearest imagery I can come up with is trying to direct the flow of a tsunami with a roll of paper towels.

It is certainly brave (I forgo usage of the other term) to try but ultimately isn’t very productive.

First, any scheme that start with conversion to a particular format is an automatic loser.

The source format is itself composed of subjects that are discarded by the conversion process.

Moreover, what if we disagree about the conversion?

Remember all the semantic diversity that gave rise to this problem? Where did it get off to?

Second, the interlinking step introduces brittleness into the process.

Both in terms of the ontology that any particular data must follow but also in terms of resolution of any linkage.

Other data sources can only be linked in if they use the correct ontology and format. And that assumes they are reachable.

I hope the project does well, but at best it will result in another semantic flavor to be integrated using topic maps.

*****
PS: The use of data heaven betrays the religious nature of the Linked Data movement. I don’t object to Linked Data. What I object to is the missionary conversion aspects of Linked Data.

« Newer PostsOlder Posts »

Powered by WordPress