Silk – A Link Discovery Framework for the Web of Data
From the website:
The Web of Data is built upon two simple ideas: First, to employ the RDF data model to publish structured data on the Web. Second, to set explicit RDF links between data items within different data sources. Background information about the Web of Data is found at the wiki pages of the W3C Linking Open Data community effort, in the overview article Linked Data – The Story So Far and in the tutorial on How to publish Linked Data on the Web.
The Silk Link Discovery Framework supports data publishers in accomplishing the second task. Using the declarative Silk – Link Specification Language (Silk-LSL), developers can specify which types of RDF links should be discovered between data sources as well as which conditions data items must fulfill in order to be interlinked. These link conditions may combine various similarity metrics and can take the graph around a data item into account, which is addressed using an RDF path language. Silk accesses the data sources that should be interlinked via the SPARQL protocol and can thus be used against local as well as remote SPARQL endpoints.
Of particular interest are the comparison operators:
A comparison operator evaluates two inputs and computes their similarity based on a user-defined metric.
The Silk framework currently supports the following similarity metrics, which return a similarity value between 0 (lowest similarity) and 1 (highest similarity) each:
Metric Description levenshtein([float maxDistance], [float minValue], [float maxValue]) String similarity based on the Levenshtein metric. jaro String similarity based on the Jaro distance metric. jaroWinkler String similarity based on the Jaro-Winkler metric. qGrams(int q) String similarity based on q-grams (by default q=2). equality Return 1 if strings are equal, 0 otherwise. inequality Return 0 if strings are equal, 1 otherwise. num(float maxDistance, float minValue, float maxValue) Computes the numeric distance between two numbers and normalizes it using the threshold.
Parameters:
maxDistance
The similarity score is 0.0 if the distance is bigger than maxDistance.
minValue
,maxValue
The minimum and maximum values which occur in the datasourcedate(int maxDays) Computes the similarity between two dates (“YYYY-MM-DD” format). At a difference of “maxDays”, the metric evaluates to 0 and progresses towards 1 with a lower difference. wgs84(string unit, float threshold, string curveStyle) Computes the geographical distance between two points.
Parameters:
unit
The unit in which the distance is measured. Allowed values: “meter” or “m” (default) , “kilometer” or “km”
threshold
Will result in a 0 for all bigger values than t, values below are varying with the curveStyle
curveStyle
“linear” gives a linear transition, “logistic” uses the logistic function f(x)=1/(1+e^(x)) gives a more soft curve with a slow slope at the start and the end of the curve but a steep one in the middle.
Author: Konrad Höffner (MOLE subgroup of Research Group AKSW, University of Leipzig)
(better formatting is available at the original page but I thought the operators important enough to report in full here)
Definitely a step towards more than opaque mapping between links. Note for example that Silk – Link Specification Language declares why two or more links are mapped together. More could be said but this is a start in the right direction.