Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

June 7, 2011

Marketing Topic Maps to Geeks

Filed under: Marketing,RDF,Topic Maps — Patrick Durusau @ 6:54 pm

Another aspect of the “oh woe is topic maps” discussion is the lack of interest in topic maps by geeks. There are open source topic map projects, presentations at geeky conferences, demos, etc., but no real geek swell for topic maps. But that same isn’t true for ontologies, RDF, description logic (ok, maybe less for DL), etc.

In retrospect, that isn’t all that surprising. Take a gander inside any of the software project categories at sourceforge.org. Any of those projects could benefit from more participation but every year sees more projects in the same categories and oft times covering the same capabilities.

Does any of that say to you: There is an answer and it has to be my answer? I won’t bother with collecting the stats for the lack of code reuse, another aspect of this issue. It is too well known to belabor.

Topic maps made the fatal mistake of saying answers are supplied by users and not developers. If you don’t think that was a mistake, take a look at any RDF vocabulary and tell me it was written by a typical user community. Almost without exception (I am sure there must be some somewhere), RDF vocabularies are written by experts and imposed on users. Hence their popularity, at least among experts anyway.

Topic map inverted the usual world view to say that since users are the source of the semantics in the texts they read, that we should start with their views. Imposing world views is always more popular than learning them, particularly among the geek community. They know what users should be doing and they damned well better do it.

Oh, the other mistake that topic maps made was to say there was more than one world view. Multiple world views that could be aligned together. The ontologists scotched that idea decades ago, although they haven’t been able to agree on the one world view that should be in place. I suppose there may be (small letters), multiple world views, but that is composed of the correct World View and numerous incorrect world views.

That would certainly be the position of US intelligence and diplomatic circles, who map into the correct World View all “incorrect world views,” which may account for their notable lack of successes over the last fifty or so years.

We should market topic maps to audiences who are interested in their own goals, not the goals of others, even geeks.

Goals from group to group. Some groups want to engage in disruptive behavior, other groups wish to prevent disruptive behavior, some want to advance research, still others want to be patent trolls.

Topic maps: Advance your goals with military grade IT. (How’s that for a new topic map slogan?)

June 1, 2011

Silk – A Link Discovery Framework for the Web of Data

Filed under: Linked Data,LOD,RDF,Semantic Web — Patrick Durusau @ 6:52 pm

Silk – A Link Discovery Framework for the Web of Data

From the website:

The Web of Data is built upon two simple ideas: First, to employ the RDF data model to publish structured data on the Web. Second, to set explicit RDF links between data items within different data sources. Background information about the Web of Data is found at the wiki pages of the W3C Linking Open Data community effort, in the overview article Linked Data – The Story So Far and in the tutorial on How to publish Linked Data on the Web.

The Silk Link Discovery Framework supports data publishers in accomplishing the second task. Using the declarative Silk – Link Specification Language (Silk-LSL), developers can specify which types of RDF links should be discovered between data sources as well as which conditions data items must fulfill in order to be interlinked. These link conditions may combine various similarity metrics and can take the graph around a data item into account, which is addressed using an RDF path language. Silk accesses the data sources that should be interlinked via the SPARQL protocol and can thus be used against local as well as remote SPARQL endpoints.

Of particular interest are the comparison operators:

A comparison operator evaluates two inputs and computes their similarity based on a user-defined metric.
The Silk framework currently supports the following similarity metrics, which return a similarity value between 0 (lowest similarity) and 1 (highest similarity) each:

Metric Description
levenshtein([float maxDistance], [float minValue], [float maxValue]) String similarity based on the Levenshtein metric.
jaro String similarity based on the Jaro distance metric.
jaroWinkler String similarity based on the Jaro-Winkler metric.
qGrams(int q) String similarity based on q-grams (by default q=2).
equality Return 1 if strings are equal, 0 otherwise.
inequality Return 0 if strings are equal, 1 otherwise.
num(float maxDistance, float minValue, float maxValue) Computes the numeric distance between two numbers and normalizes it using the threshold.
Parameters:
maxDistance The similarity score is 0.0 if the distance is bigger than maxDistance.
minValue, maxValue The minimum and maximum values which occur in the datasource
date(int maxDays) Computes the similarity between two dates (“YYYY-MM-DD” format). At a difference of “maxDays”, the metric evaluates to 0 and progresses towards 1 with a lower difference.
wgs84(string unit, float threshold, string curveStyle) Computes the geographical distance between two points.
Parameters:
unit The unit in which the distance is measured. Allowed values: “meter” or “m” (default) , “kilometer” or “km”

threshold Will result in a 0 for all bigger values than t, values below are varying with the curveStyle
curveStyle “linear” gives a linear transition, “logistic” uses the logistic function f(x)=1/(1+e^(x)) gives a more soft curve with a slow slope at the start and the end of the curve but a steep one in the middle.
Author: Konrad Höffner (MOLE subgroup of Research Group AKSW, University of Leipzig)

(better formatting is available at the original page but I thought the operators important enough to report in full here)

Definitely a step towards more than opaque mapping between links. Note for example that Silk – Link Specification Language declares why two or more links are mapped together. More could be said but this is a start in the right direction.

May 30, 2011

Semantic Web Dog Food (There’s a fly in my
bowl.)

Filed under: Conferences,OWL,RDF,RDFa,Semantic Web — Patrick Durusau @ 6:59 pm

Semantic Web Dog Food

From the website:

Welcome to the Semantic Web Conference Corpus – a.k.a. the Semantic Web Dog Food Corpus! Here you can browse and search information on papers that were presented, people who attended, and other things that have to do with the main conferences and workshops in the area of Semantic Web research.

We currently have information about

  • 2133 papers,
  • 5020 people and
  • 1273 organisations at
  • 20 conferences and
  • 132 workshops,

and a total of 126886 unique triples in our database!

The numbers looked low to me until I read in the FAQ:

This is not just a site for ISWC [International Semantic Web Conference] and ESWC [European Semantic Web Conference] though. We hope that, in time, other metadata sets relating to Semantic Web activity will be hosted here — additional bibliographic data, test sets, community ontologies and so on.

This illustrates a persistent problem of the Semantic Web. This site has one way to encode the semantics of these papers, people, conferences and workshops. Other sources of semantic data on these papers, people, conferences and workshops may well use other ways to encode those semantics. And every group has what it feels are compelling reasons for following its choices and not the choices of others. Assuming they are even aware of the choices of others. (Discovery being another problem but I won’t talk about that now.)

The previous semantic diversity of natural language is now represented by a semantic diversity of ontologies and URIs. Now our computers can more rapidly and reliably detect that we are using different vocabularies. The SW seems like a lot of work for such a result. Particularly since we continue to use diverse vocabularies and more diverse vocabularies continue to arise.

The SW solution, using OWL Full:

5.2.1 owl:sameAs

The built-in OWL property owl:sameAs links an individual to an individual. Such an owl:sameAs statement indicates that two URI references actually refer to the same thing: the individuals have the same “identity”.

For individuals such as “people” this notion is relatively easy to understand. For example, we could state that the following two URI references actually refer to the same person:

<rdf:Description rdf:about="#William_Jefferson_Clinton">
<owl:sameAs rdf:resource="#BillClinton"/>
</rdf:Description>

The owl:sameAs statements are often used in defining mappings between ontologies. It is unrealistic to assume everybody will use the same name to refer to individuals. That would require some grand design, which is contrary to the spirit of the web.

In OWL Full, where a class can be treated as instances of (meta)classes, we can use the owl:sameAs construct to define class equality, thus indicating that two concepts have the same intensional meaning. An example:

<owl:Class rdf:ID="FootballTeam">
<owl:sameAs rdf:resource="http://sports.org/US#SoccerTeam"/>
</owl:Class>

One could imagine this axiom to be part of a European sports ontology. The two classes are treated here as individuals, in this case as instances of the class owl:Class. This allows us to state that the class FootballTeam in some European sports ontology denotes the same concept as the class SoccerTeam in some American sports ontology. Note the difference with the statement:

<footballTeam owl:equivalentClass us:soccerTeam />

which states that the two classes have the same class extension, but are not (necessarily) the same concepts.

Anyone see a problem? Other than requiring the use of OWL Full?

The absence of any basis for “…denotes the same concept as….?” I can’t safely reuse this axiom because I don’t know on what basis its author made such a claim. The URIs may provide further information that may satisfy me the axiom is correct but that still leaves me in the dark as to why the author of the axiom thought it to be correct. Overly precise for football/soccer ontologies you say but what of drug interaction ontologies? Or ontologies that govern highly sensitive intelligence data?

So we repeat semantic diversity, create maps to overcome the repeated semantic diversity and the maps we create have no explicit basis for the mappings they represent. Tell me again why this was a good idea?

May 24, 2011

Cassa

Filed under: RDF,SPARQL,Topic Maps — Patrick Durusau @ 10:25 am

Cassa

From the webpage:

A SPARQL 1.1 Graph Store HTTP Protocol [1] implementation for RDF and Topic Maps.

[1] SPARQL 1.1 Graph Store HTTP Protocol

The somewhat longer announcement on topicmapmail, SPARQL 1.1 Graph Store HTTP Protocol for Topic Maps:

Last week discovered the SPARQL 1.1 Graph Store HTTP Protocol [1] and I wondered if this wouldn’t be a good alternative to SDShare [2].

The graph store protocol uses no artificial technologies like Atom but uses REST and RDF consequently. The service uses an ontology [3] to inform the client about available graphs etc.

The protocol allows creation of graphs, deletion of graphs and updating graphs and discovery of graphs (through the service description).

The protocol is rather generic, so it’s usable for Topic Maps as well (graph == topic map).

The protocol provides no fragments/snapshots like SDShare, though. Adding these functionality to the protocol would be interesting, I’d think. I.e. each graph update would trigger a new fragment. Maybe this functionality would also solve the “push problem” [4] without inventing yet another syntax. The description of the available fragments should also be done with an ontology and not solely with Atom, though.

Anyway, I wanted to mention it as a good, *dogfooding* protocol which could be used for Topic Maps.

I created an implementation (Cassa) of the protocol at [5] (no release yet). The implementation supports Topic Maps and RDF but it doesn’t provide the service description yet. And I didn’t translate the service description ontology to Topic Maps yet.

[1] <http://www.w3.org/TR/2011/WD-sparql11-http-rdf-update-20110512/>
[2] <http://www.egovpt.org/fg/CWA_Part_1b>
[3] <http://www.w3.org/TR/2011/WD-sparql11-service-description-20110512/>
[4] <http://www.infoloom.com/pipermail/topicmapmail/2010q4/008761.html>
[5] <https://github.com/heuer/cassa>

May 20, 2011

SIREn: Efficient semi-structured Information Retrieval for Lucene

Filed under: Information Retrieval,Lucene,RDF — Patrick Durusau @ 4:06 pm

SIREn: Efficient semi-structured Information Retrieval for Lucene

From the announcement:

Efficient, large scale handling of semi-structured data (including RDF) is increasingly an important issue to many web and enterprise information reuse scenarios.

Querying graph structured data (RDF) is commonly achieved using specific solutions, called triplestores, typically based on DBMS backends. In Sindice we however needed something much more scalable than DBMS and with the desirable features of the typical Web Search engines: top-k query processing, real time updates, full text search, distributed indexes over shards, etc.

While Lucene has long offered these capabilities, its native capabilities are not intended for large semi-structured document collections (or documents with very different schemas). For this reason we developed SIREn – Semantic Information Retrieval Engine – a Lucene plugin to overcome these shortcomings and efficiently index and query RDF, as well as any textual document with an arbitrary amount of metadata fields.

Given its general applicability, we are delighted to release SIREn under the Apache 2.0 open source license. We hope businesses will find SIREn useful in implementing solutions upon the Web of Data.

You can start by looking at the features, review the performance benchmarks, learn more by reading the short tutorial and then download and try SIREn by yourself.

This looks very cool!

It’s tuple processing capabilities in particular!

May 18, 2011

Balisage 2011 Preliminary Program

Filed under: Conferences,Data Mining,RDF,SPARQL,XPath,XQuery,XSLT — Patrick Durusau @ 6:40 pm

At-A-Glance

Program (in full)

From the announcement (Tommie Usdin):

Topics this year include:

  • multi-ended hypertext links
  • optimizing XSLT and XQuery processing
  • interchange, interoperability, and packaging of XML documents
  • eBooks and epub
  • overlapping markup and related topics
  • visualization
  • encryption
  • data mining

The acronyms this year include:

XML XSLT XQuery XDML REST XForms JSON OSIS XTemp RDF SPARQL XPath

New this year will be:

Lightning talks: an opportunity for participants to say what they think, simply, clearly, and persuasively.

As I have said before, simply the best conference of the year!

Conference site: http://www.balisage.net/

Registration: http://www.balisage.net/registration.html

May 13, 2011

SPARQL 1.1 Drafts – Last Call

Filed under: Query Language,RDF,SPARQL — Patrick Durusau @ 7:19 pm

SPARQL 1.1 Drafts – Last Call

From the W3C News:

May 2, 2011

Triplification Challenge 2011

Filed under: Conferences,RDF — Patrick Durusau @ 10:31 am

Triplification Challenge 2011

From the website:

The yearly organized Linked Data Triplification Challenge awards prizes to the most promising application demonstrations and approaches in three fields related to Linked Data.

For the success of the Semantic Web it is from our point of view crucial to overcome the chicken-and-egg problem of missing semantic representations on the Web and the lack of their utilization within concrete applications, to solve real-world problems. The Triplification Challenge aims to expedite this process by raising awareness and showcasing best practices.

3.000 € in prize money will be awarded to the winners of the open track and the special Open Government Data track.

The challenge is open to anyone interested in applying Semantic Web and Linked Data technologies. This might include students, developers, researchers, and people from industry. Individual or group submissions are both acceptable.

Could be an interesting opportunity to expose topic maps as triples. Not to mention an attractive prize!

Important dates:

Extended Submission Deadline: May 30, 2011

Notification of Acceptance: June 27, 2011

Camera-Ready Paper: July 18, 2011

I-SEMANTICS 2011: September 7 – 9, 2011

April 26, 2011

…Efficient Subgraph Matching on Huge Networks (or, > 1 billion edges < 1 second)

Filed under: Graphs,Networks,RDF — Patrick Durusau @ 2:18 pm

A Budget-Based Algorithm for Efficient Subgraph Matching on Huge Networks by Matthais Br&oul;cheler, Andrea Pugliese, V.S. Subrahmanian. (Presented at GDM 2011.)

Abstract:

As social network and RDF data grow dramatically in size to billions of edges, the ability to scalably answer queries posed over graph datasets becomes increasingly important. In this paper, we consider subgraph matching queries which are often posed to social networks and RDF databases — for such queries, we want to find all matching instances in a graph database. Past work on subgraph matching queries uses static cost models which can be very inaccurate due to long-tailed degree distributions commonly found in real world networks. We propose the BudgetMatch query answering algorithm. BudgetMatch costs and recosts query parts adaptively as it executes and learns more about the search space. We show that using this strategy, BudgetMatch can quickly answer complex subgraph queries on very large graph data. Specifically, on a real world social media data set consisting of 1.12 billion edges, we can answer complex subgraph queries in under one second and significantly outperform existing subgraph matching algorithms.

Built on top of Neo4J, BudgetMatch, dynamically updates budgets assigned to vertexes.

Aggressive pruning gives some rather attractive results.

April 20, 2011

RDFa API and RDFa 1.1 Primer Drafts Updated

Filed under: RDF,RDFa,Semantic Web — Patrick Durusau @ 2:10 pm

The RDF Web Applications Working Group has published new Working Drafts.

RDFa API

RDFa 1.1 Primer

March 30, 2011

State of the LOD Cloud

Filed under: LOD,RDF,Semantic Web — Patrick Durusau @ 12:36 pm

State of the LOD Cloud

A more complete resource than the one I referenced in The Linking Open Data cloud diagram.

I haven’t seen any movement towards solving any of the fundamental identity issues with the LOD cloud.

On the other hand, topic mappers can make use of these URIs as names and specify other data published with those URIs to form an actual identification.

One that is reliably interchangeable with others.

I think the emphasis on URIs being dereferencable.

No one says what happens after a URI is dereferenced but that’s to avoid admitting that a URI is insufficient as an identifier.

March 23, 2011

RDF and Semantic Web

Filed under: RDF,Semantic Web,Topic Maps — Patrick Durusau @ 6:03 am

RDF and Semantic Web: can we reach escape velocity?

Jenni Tennison’s slides from TPAC 2010 are an interesting insight into how an “insider” views the current state of RDF and the Semantic Web.

I disagree with her on a couple of crucial points:

RDF’s only revolution, but the key one, is using URIs to name things, including properties and classes

identifying things with URIs does two really useful things

  • disambiguates, enabling joins with other data using same URI
    • mash-ups beyond mapping things on a Google Map
  • provides something at the end of the URI
    • extra information, explanation, context
    • in a basic entity-attribute-value model that enables combination without either up-front agreement or end-user jiggerypokery

First, the “identifying things with URIs” is re-use of a very old idea, the perfect language, which has a universal and unbroken record of failure. (see my Blast from the Past and citations therein.)

Second, how is combination possible without either up-front agreement or end-user jiggerypokery?

Combining information without either up-front agreement or end-user jiggerypokery is why we get such odd search results now.

Let’s take a simple example. Search for “democracy” and see what results you get.

Now, do you really think that “democracy” (limiting my remarks to the US at the moment) from documents in the 18th and 19th centuries means the same thing as “democracy” after the fall of slavery but prior to women getting the right to vote? Or does it means the same thing as it does today? Or does it mean the same thing as its use in Egypt, where classes other than the moneyed ones may be favored?

No doubt you will say that someone could create URIs for all those senses of democracy, which is true, but the question is will we use them consistently? The answer to that has been no up to this point.

People are inconsistent, semantically speaking and there is no showing that is going to change.

Which brings me to the second major area of my disagreement.

RDF and the Semantic Web are failing (present tense) because they are the answer to a problem RDF and Semantic Web followers are interested in solving.

But not the answer to problems that interests anyone else.

At least not enough to pay the price of RDF and the Semantic Web.

To be fair, topic maps faces the same issue.

But at least topic maps started off with a particular problem (combining indexes) and then expanded to be a general solution.

The Semantic Web started off as a general solution in search of problems that would justify the cost of adoption. Not the best strategy.

I do like Jenni’s emphasis on assisting governments to make their data usefully available. That is a good thing and one that we agree on.

Both topic maps and RDF/SW need to analyze the problems of governments (and others) in making such data available.

Then, understanding the issues they face, derive as low cost a solution as possible within their paradigms to solve that problem.

That could involve URIs, for example, assuming there was a URI + N properties serve to identify a subject protocol.

Not that such a protocol makes us any more semantically consistent, but having more than one property to be inconsistent about, may (emphasis on may) reduce the range of semantic inconsistency.

Take my democracy example. If I had http://NotRealURI/democracy and a property of range, 1800-1850 and to match my sense of democracy required matching both the URI and the date range, that would be a step towards reducing semantic inconsistency.

It is the lack of a requirement that more than one property be matched for identity that underlies the technical failure of RDF/Semantic Web.

Its social failure is in not answering questions that are of interest to developers and ultimately users.

Providing useful answers to problems, seen by users as problems, is the way forward for both topic maps and RDF/Semantic Web.

Linked Data: Evolving the Web into a Global Data Space (The Online Book)

Filed under: Linked Data,RDF,Topic Maps — Patrick Durusau @ 6:01 am

Linked Data: Evolving the Web into a Global Data Space (The Online Book)

The Principles of Linked Data:

1. Use URIs as names for things.
2. Use HTTP URIs, so that people can look up those names.
3. When someone looks up a URI, provide useful information, using the standards (RDF, SPARQL).
4. Include links to other URIs, so that they can discover more things.

First observation/question

What made the original WWW proposal different from all hypertext systems before it?

There had been a number of hypertext systems before the WWW, some with capabilities that the WWW continues to lack.

But what make it different or perhaps better, successful?

That links could fail?

Oh, but if we are going to have a global data space that identifies stuff, it can’t fail. Yes?

So, we are taking a flexible, fault tolerant system (the World Wide Web) and making it into an inflexible, brittle system (the Semantic Web).

That sounds like a very, very bad plan.

Second observation/question

Global data space?

Even allowing for marketing puff, that is a bit of a stretch. Well, more than that, it is an outright lie.

Consider all the data that is now being collected by the Large Hadron Collider in the CERN. So much data that data has to be discarded. Simply can’t keep it all.

Or all the data from previous space missions and astronomical observations, both visible and in other bands.

Or all the legal (and one assumes illegal) records of government activity.

Or all the other information, records, data from human activity.

And not just the documents, but the stuff people talk about in them and the relationships between the things they talk about.

Some of that can be addressed or obtained over the web, but that isn’t the same thing as identifying all the stuff talked about in that material on the WWW.

Now, if Linked Data wanted to claim that the WWW was a global data space for information of interest to a particular group, well, that comes closer to being believable at least.

*****

However silly a single, unifying data model may sound, it is true that making data more accessible, by any means, makes it easier to make sensible use of it.

Despite having a drank the Kool-Aid perspective on linked data, this book is a useful introduction to it as a technology.

Ignore the “…put your hand on the radio and feel the power…” type stuff.

Keep saying to yourself: “it’s just another format, it’s just another format…,” and you will be fine.

March 15, 2011

RDF – Gravity

Filed under: Graphs,RDF — Patrick Durusau @ 5:10 am

RDF – Gravity

From the website:

RDF Gravity is a tool for visualising RDF/OWL Graphs/ ontologies.

Its main features are:

  • Graph Visualization
  • Global and Local Filters (enabling specific views on a graph)
  • Full text Search
  • Generating views from RDQL Queries
  • Visualising multiple RDF files

RDF Gravity is implemented by using the JUNG Graph API and Jena semantic web toolkit.

Truly stunning work.

Too bad that RDF will never progress beyond simple indexing to complex and interchangeable indexing.

I say that. So long as Tim Berners-Lee clings to the notion of new names (URLs) as overcoming the problems with old names (anything else), RDF is unlikely to improve.

If an RDF could identify a subject using multiple properties and then inform others of that complex identification, then at least there would be an opportunity to either agree or disagree with an identification.

As it is now, who knows what anyone is identifying with a URL?

Your guess is as good as mine.

So if I were to say that “http://semweb.salzburgresearch.at/apps/rdf-gravity/index.html” is truly stunning work, do I mean the software? The website? Something I saw at the website?

If that sounds trivial, imagine the same situation and the URL is a pointer to a procedure for the coolant system on a nuclear reactor. Not quite so trivial is it?

Best to know what we are talking about in most situations.

March 14, 2011

Personal Semantic Data – PSD 2011

Filed under: Conferences,RDF,Semantic Web,Semantics — Patrick Durusau @ 6:51 am

Personal Semantic Data – PSD 2011

From the website:

Personal Semantic Data is scattered over several media, and while semantic technologies are already successfully deployed on the Web as well as on the desktop, data integration is not always straightforward. The transition from the desktop to a distributed system for Personal Information Management (PIM) raises new challenges which need to be addressed. These challenges overlap areas related to human-computer interaction, user modeling, privacy and security, information extraction, retrieval and matching.

With the growth of the Web, a lot of personal information is kept online, on websites like Google, Amazon, Flickr, YouTube, Facebook. We also store pieces of personal information on our computers, on our phones and other devices. All the data is important, that’s why we keep it, but managing such a fragmented system becomes a chore on its own instead of providing support and information for doing the tasks we have to do. Adding to the challenge are proprietary formats and locked silos (online or offline in applications).

The Semantic Web enables the creation of structured and interlinked data through the use of common vocabularies to describe it, and a common representation – RDF. Through projects like Linking Open Data (LOD), SIOC and FOAF, large amounts of data is available now on the Web in structured form, including personal information about people and their social relationships. Applying semantic technologies to the desktop resulted in the Semantic Desktop, which provides a framework for linking data on the desktop.

The challenge lies in extending the benefits of the semantic technologies across the borders of the different environments, and providing a uniform view of one’s personal information regardless of where it resides, which vocabularies were used to describe it and how it is represented. Sharing personal semantic data is also challenging, with privacy and security being two of the most important and difficult issues to tackle.

Important Dates:

15 April 2011 – Submission deadline
30 April 2011 – Author notification
10 May 2011 – Camera-ready version
26 June 2011 – Workshop day

I think the secret of semantic integration is the more information that becomes available, the more heterogeneous the systems and information become and the greater the need for topic maps.

Mostly because replacing that many systems in a coordinated way, over the vast diversity of interests and users, simply isn’t possible.

Would be nice to have a showing of interest by topic maps at this workshop.

February 23, 2011

Berlin SPARQL Benchmark (BSBM)

Filed under: RDF,SPARQL — Patrick Durusau @ 3:06 pm

Berlin SPARQL Benchmark (BSBM)

From the website:

The SPARQL Query Language for RDF and the SPARQL Protocol for RDF are implemented by a growing number of storage systems and are used within enterprise and open web settings. As SPARQL is taken up by the community there is a growing need for benchmarks to compare the performance of storage systems that expose SPARQL endpoints via the SPARQL protocol. Such systems include native RDF stores, Named Graph stores, systems that map relational databases into RDF, and SPARQL wrappers around other kinds of data sources.

The Berlin SPARQL Benchmark (BSBM) defines a suite of benchmarks for comparing the performance of these systems across architectures. The benchmark is built around an e-commerce use case in which a set of products is offered by different vendors and consumers have posted reviews about products. The benchmark query mix illustrates the search and navigation pattern of a consumer looking for a product.

NEWS

02/22/2011: Results of the February 2011 BSBM V3 Experiment released, benchmarking Virtuoso, BigOWLIM, 4store, BigData and Jena TDB with 100 million and 200 million triples datasets within the Exploreand Update use cases

Serious sized bench marking files.

Do wonder how diverse the file content is compared to content in the “wild” so to speak?

AI Mashup Challenge 2011

Filed under: Mashups,RDF,Semantic Web — Patrick Durusau @ 3:01 pm

AI Mashup Challenge 2011

Due date: 1 April 2011

From the website:

The AI mashup challenge accepts and awards mashups that use AI technology, including but not restricted to machine learning and data mining, machine vision, natural language processing, reasoning, ontologies and the semantic web.
Imagine for example:

  • Information extraction or automatic text summarization to create a task-oriented overview mashup for mobile devices.
  • Semantic Web technology and data sources adapting to user and task-specific configurations.
  • Semantic background knowledge (such as ontologies, WordNet or Cyc) to improve search and content combination.
  • Machine translation for mashups that cross language borders.
  • Machine vision technology for novel ways of aggregating images, for instance mixing real and virtual environments.
  • Intelligent agents taking over simple household planning tasks.
  • Text-to-speech technology creating a voice mashup with intelligent and emotional intonation.
  • The display of Pub Med articles on a map based on geographic entity detection referring to diseases or health centers.

The emphasis is not on providing and consuming semantic markup, but rather on using intelligence to mashup these resources in a more powerful way.

This looks like an opportunity for an application that assists users in explicit identification or confirmation of identification of subjects.

Rather than auto-correcting, human-correcting.

Assuming we can capture the corrections, wouldn’t that mean that our apps would incrementally get “smarter?” Rather than starting off from ground zero with each request? (True, a lot of analysis goes on with logs, etc. Why not just ask?)

February 18, 2011

DataLift

Filed under: Dataset,Linked Data,Ontology,RDF — Patrick Durusau @ 5:12 am

DataLift

The DataLift project will no doubt produce some useful tools and output but reading its self-description:

The project will provide tools allowing to facilitate each step of the publication process:

  1. selecting ontologies for publishing data
  2. converting data to the appropriate format (RDF using the selected ontology)
  3. publishing the linked data
  4. interlinking data with other data sources

I am struck by how futile the effort sounds in the face of petabytes of data flow, changing semantics of that data and changing semantics of other data, with which it might be interlinked.

The nearest imagery I can come up with is trying to direct the flow of a tsunami with a roll of paper towels.

It is certainly brave (I forgo usage of the other term) to try but ultimately isn’t very productive.

First, any scheme that start with conversion to a particular format is an automatic loser.

The source format is itself composed of subjects that are discarded by the conversion process.

Moreover, what if we disagree about the conversion?

Remember all the semantic diversity that gave rise to this problem? Where did it get off to?

Second, the interlinking step introduces brittleness into the process.

Both in terms of the ontology that any particular data must follow but also in terms of resolution of any linkage.

Other data sources can only be linked in if they use the correct ontology and format. And that assumes they are reachable.

I hope the project does well, but at best it will result in another semantic flavor to be integrated using topic maps.

*****
PS: The use of data heaven betrays the religious nature of the Linked Data movement. I don’t object to Linked Data. What I object to is the missionary conversion aspects of Linked Data.

New RDF Charter
(rant on backwards compatibility)

Filed under: RDF — Patrick Durusau @ 4:49 am

RDF Working Group Charter

A new RDF Working Group charter has been announced.

One serious problem:

For all new features, backwards compatibility with the current version of RDF is of great importance. This means that all efforts should be made so that

  • any valid RDF graphs (in terms of the RDF 2004 version) should remain valid in terms of a new version of RDF; and
  • any RDF or RDFS entailment drawn on RDF graphs using the 2004 semantics should be valid entailment in terms of a new version of RDF and RDFS

Care should be taken to not jeopardize exisiting RDF deployment efforts and adoption. In case of doubt, the guideline should be not to include a feature in the set of additions if doing so might raise backward compatibility issues.

What puzzles me is why this mis-understanding of backwards compatibility continues to exist.

Any RDF graph would remain valid under 2004 RDF and 2004 semantics, remain 2004 semantics. OK, so?

What is the difficulty with labeling the new version of RDF, RDF-NG? With appropriate tokens in any syntax?

True, that might mean that 7 year old software and libraries might not continue to work. How many users do you think are intentionally using 7 year old software?

Ah, you mean you are still writing calls to 7 year old libraries in your software? Whose bad is that?

Same issue is about to come up in other circles, some closer to home than others.

*****

PS: This particularly annoying is that some vendors insist (falsely) that ISO is too slow for their product development models.

How can ISO be too slow if every error is enshrined forever in the name of backwards compatibility?

If RDF researchers haven’t learned anything they would do differently in RDF in the last seven years, well, that’s just sad.

January 27, 2011

Easy Semantic Solution Is At Hand! – Post

Filed under: OWL,RDF,Semantic Web — Patrick Durusau @ 6:15 am

The Federated Enterprise (Using Semantic Technology Standards to Federate Information and to Enable Emergent Analytics)

I had to shorten the title a bit. 😉

Wanted you to be aware of the sort of nonsense that data warehouse people are being told:

The procedure described above enabling federation based on semantic technology is not hard to build; it is just a different way of describing things that people in your enterprise are already describing using incompatible technologies like spreadsheets, text processors, diagramming tools, modeling tools, email, etc. The semantic approach simply requires that everything be described in a single technology, RDF/OWL. This simple change in how things are described enables federation and the paradigm shifting capabilities that accompany it.

Gee, why didn’t we think about that? A single technology to describe everything.

Shakespeare would call this …a tale told by an idiot….

Just thought you could start the day with a bit of amusement.

*****
PS: It’s not the fault of RDF or OWL that people say stupid things about them.

January 11, 2011

1st International Workshop on Semantic
Publication (SePublica 2011)

Filed under: Conferences,Ontology,OWL,RDF,Semantic Web,SPARQL — Patrick Durusau @ 7:24 pm

1st International Workshop on Semantic Publication (SePublica 2011) in connection with 8th Extended Semantic Web Conference (ESWC 2011), May 29th or 30th, Hersonissos, Crete, Greece.

From the Call for Papers:

The CHALLENGE of the Semantic Web is to allow the Web to move from a dissemination platform to an interactive platform for networked information. The Semantic Web promises to “fundamentally change our experience of the Web”.

In spite of improvements in the distribution, accessibility and retrieval of information, little has changed in the publishing industry so far. The Web has succeeded as a dissemination platform for scientific and non-scientific papers, news, and communication in general; however, most of that information remains locked up in discrete documents, which are poorly interconnected to one another and to the Web.

The connectivity tissues provided by RDF technology and the Social Web have barely made an impact on scientific communication nor on ebook publishing, neither on the format of publications, nor on repositories and digital libraries. The worst problem is in accessing and reusing the computable data which the literature represents and describes.

No, I am not going to say that topic maps are the magic bullet that will solve all those issues or the ones listed in their Questions and Topics of Interest.

What I do think topic maps bring to the table is an awareness that semantic interoperability isn’t primarily a format or computational problem.

Every new (and impliedly universal) format or model simply compounds the semantic interoperability problem.

By creating yet more formats and/or models between which semantic interoperability has to be designed.

Starting with the question of what subjects need to be identified and how they are identified now could lead to a viable, local semantic interoperability solution.

What more could a client want?

Local semantic interoperability solutions can form the basis for spreading semantic interoperability, one solution at a time.

*****
PS: Forgot the important dates:

Paper/Demo Submission Deadline: February 28, 23:59 Hawaii Time

Acceptance Notification: April 1

Camera Ready Version: April 15

SePublica Workshop: May 29 or May 30 (to be announced)

December 17, 2010

CFP – Dealing with the Messiness of the Web of Data – Journal of Web Semantics

CFP – Dealing with the Messiness of the Web of Data – Journal of Web Semantics

From the call:

Research on the Semantic Web, which is now in its second decade, has had a tremendous success in encouraging people to publish data on the Web in structured, linked, and standardized ways. The success of what has now become the Web of Data can be read from the sheer number of triples available within the Linked-Open Data, Linked Life Data and Open-Government initiatives. However, this growth in data makes many of the established assumptions inappropriate and offers a number of new research challenges.

In stark contrast to early Semantic Web applications that dealt with small, hand-crafted ontologies and data-sets, the new Web of Data comes with a plethora of contradicting world-views and contains incomplete, inconsistent, incorrect, fast-changing and opinionated information. This information not only comes from academic sources and trustworthy institutions, but is often community built, scraped or translated.

In short: the Web of Data is messy, and methods to deal with this messiness are paramount for its future.

Now, we have two choices as the topic map community:

  • congratulate ourselves for seeing this problem long ago, high five each other, etc., or
  • step up and offer topic map solutions that incorporate as much of the existing SW work as possible.

I strongly suggest the second one.

Important dates:

We will aim at an efficient publication cycle in order to guarantee prompt availability of the published results. We will review papers on a rolling basis as they are submitted and explicitly encourage submissions well before the submission deadline. Submit papers online at the journal’s Elsevier Web site.

Submission deadline: 1 February 2011
Author notification: 15 June 2011

Revisions submitted: 1 August 2011
Final decisions: 15 September 2011
Publication: 1 January 2012

December 8, 2010

Semantic Web – Journal Issue 1/1-2

Filed under: OWL,RDF,Semantic Web — Patrick Durusau @ 8:18 pm

Semantic Web

The first issue of Semantic Web is openly viewable and now online.

In their introductory remarks the editors focus in part on the journal’s subtitle:

The journal’s subtitle – Interoperability, Usability, Applicability – re?ects the wide scope of the journal, by putting an emphasis on enabling new technologies and methods. Interoperability refers to aspects such as the seamless integration of data from heterogeneous sources, on-the-?y composition and interoperation of Web services, and next-generation search engines. Usability encompasses new information retrieval paradigms, user interfaces and interaction, and visualization techniques, which in turn require methods for dealing with context dependency, personalization, trust, and provenance, amongst others, while hiding the underlying computational issues from the user. Applicability refers to the rapidly growing application areas of Semantic Web technologies and methods, to the issue of bringing state-of-the-art research results to bear on real-world applications, and to the development of new methods and foundations driven by real application needs from various domains.

Skimming the table of contents I can see lots of opportunity for comments and rejoinders.

For the present I simply commend this new journal and its contents to you for your reading pleasure.

December 7, 2010

Open Provenance Model *
Ontology – RDF – Semantic Web

Filed under: Ontology,RDF,Semantic Web — Patrick Durusau @ 11:37 am

A spate of provenance ontology materials landed in my inbox today:

  1. Open Provenance Model Ontology (OPMO)
  2. Open Provenance Model Vocabulary (OPMV)
  3. Open Provenance Model (OPM)
  4. Provenance Vocabulary Mappings

We should could ourselves fortunate that the W3C working group did not title their document: Open Provenance Model Vocabulary Mappings.

The community would be better served with less clever and more descriptive naming.

No doubt the Open Provenance Model Vocabulary (#2 above) has some range of materials in mind.

I don’t know the presumed target but some candidates come to mind:

  • Art Museum Open Provenance Model (including looting/acquisition terms)
  • Library Open Provenance Model
  • Natural History Open Provenance Model
  • ….

I am, of course, giving the author’s the benefit of the doubt in presuming their intent was not to create a universal model of provenance.

For topic map purposes, the Provenance Vocabulary Mappings document (#4 above) is the most interesting. Read through it and then answer the questions below.

Questions:

  1. Assume you have yet another provenance vocabulary. On what basis would you map it to any of the other vocabularies discussed in #4?
  2. Most of the mappings in #4 give a rationale. How is that (if it is) different from properties and merging rules for topic maps?
  3. What should we do with mappings in #4 or elsewhere that don’t give a rationale?
  4. How should we represent rationales for mappings? Is there some alternative not considered by topic maps?

Summarize your thoughts in 3-5 pages for all four questions. They are too interrelated to answer separately. You can use citations if you like but these aren’t questions answered in the literature. Or, well, at least I don’t find any of the answers in the literature convincing. 😉 Your experience may vary.

December 3, 2010

Declared Instance Inferences (DI2)? (RDF, OWL, Semantic Web)

Filed under: Inference,OWL,RDF,Semantic Web,Subject Identity — Patrick Durusau @ 8:49 am

In recent discussions of identity, I have seen statements that OWL reasoners could infer that two or more representatives stood for the same subject.

That’s useful but I wondered if the inferencing overhead is necessary in all in such cases?

If a user recognizes that a subject representative (a subject proxy in topic map terms) represents the same subject as another representative, a declarative statement avoids the need for artificial inferencing.

I am sure there are cases where inferencing is useful, particularly to suggest inferences to users, but declared inferences could reduce that need and the overhead.

Declarative information artifacts could be created that contain rules for known identifications.

For example, gene names found in PubMed. If two or more names are declared to refer to the same gene, where is the need for inferencing?

With such declarations in place, no reasoner has to “infer” anything about those names.

Declared instance inferences (DI2) reduce semantic dissonance, inferencing overhead and uncertainty.

Looks like a win-win situation to me.

*****
PS: It occurs to me that ontologies are also “declared instance inferences” upon which artificial reasoners rely. The instances happen to be classes and not individuals.

November 30, 2010

RDF Extension for Google Refine

Filed under: Data Mining,RDF,Software — Patrick Durusau @ 1:09 pm

RDF Extension for Google Refine

From the website:

This project adds a graphical user interface(GUI) for exporting data of Google Refine projects in RDF format. The export is based on mapping the data to a template graph using the GUI.

See my earlier post on Google Refine 2.0.

BTW, if you don’t know the folks at DERI – Digital Enterprise Research Institute take a few minutes (it will stretch into hours) to explore their many projects. (I will be doing a separate post on projects of particular interest for topic maps from DERI soon.)

November 28, 2010

Names, Identifiers, LOD, and the Semantic Web

Filed under: LOD,Names,RDF,Semantic Web,Subject Identifiers — Patrick Durusau @ 5:28 pm

I have been watching the identifier debate in the LOD community with its revisionists, personal accounts and other takes on what the problem is, if there is a problem and how to solve the problem if there is one.

I have a slightly different question: What happens when we have a name/identifier?

Short of being present when someone points to or touches an object, themselves, you (if the TSA) and says a name or identifier, what happens?

Try this experiment. Take a sheet of paper and write: George W. Bush.

Now write 10 facts about George W. Bush.

Please circle which ones that you think must match to identify George W. Bush.

So, even though you knew the name George W. Bush, isn’t it fair to say that the circled facts are what you would use to identify George W. Bush?

Here’s is the fun part: Get a colleague or co-worker to do the same experiment. (Substitute Lady Gaga if your friends don’t know enough facts about George W. Bush.)

Now compare several sets of answers for the same person.

Working from the same name, you most likely listed different facts and different ones you would use to identify that subject.

Even though most of you would agree that some or all of the facts listed go with that person.

It sounds like even though we use identifiers/names, those just clue us in on facts, some of which we use to make the identification.

That’s the problem isn’t it?

A name or identifier can make us think of different facts (possibly identifying different subjects) and even if the same subject, we may use different facts to identify the subject.

Assuming we are at a set of facts (RDF graph, whatever) we need to know: What facts identify the subject?

And a subject may have different identifying properties, depending on the context of identification.

Questions:

  1. How to specify essential facts for identification as opposed to the extra ones?
  2. How to answer #1 for an RDF graph?
  3. How do you make others aware of your answer in #2?

Comments/suggestions?

November 26, 2010

Scalable reduction of large datasets to interesting subsets

Filed under: OWL,RDF,Semantic Web — Patrick Durusau @ 11:04 am

Scalable reduction of large datasets to interesting subsets Authors: Gregory Todd Williams, Jesse Weaver, Medha Atre, James A. Hendler Keywords: Billion Triples Challenge, Scalability, Parallel, Inferencing, Query, Triplestore

Abstract:

With a huge amount of RDF data available on the web, the ability to find and access relevant information is crucial. Traditional approaches to storing, querying, and reasoning fall short when faced with web-scale data. We present a system that combines the computational power of large clusters for enabling large-scale reasoning and data access with an efficient data structure for storing and querying the accessed data on a traditional personal computer or other resource-constrained device. We present results of using this system to load the 2009 Billion Triples Challenge dataset, materialize RDFS inferences, extract an “interesting” subset of the data using a large cluster, and further analyze the extracted data using a personal computer, all in the order of tens of minutes.

I wonder about the use of the phrase “…web-scale data?”

if a billion triples is a real challenge, then what happens when RDF/RDFa is deployed across an entity and inference rich body of material like legal texts? Or property descriptions? Or the ownership rights based on property descriptions?

In any event, the prep of the data for inferencing illustrates a use case for topic maps:

Information about people is represented in different ways in the BTC2009 dataset, including the use of the FOAF,7 SIOC,8 DBpedia,9 and AKT10 ontologies. We create a simple upper ontology to bring together concepts and properties pertaining to people. For example, we define the class up:Person which is defined as a superclass to existing person classes, e.g., foaf:Person. We do the same for relevant properties, e.g., up:full name is a superproperty of akt:full-name. Note that “up” is the namespace prefix for our upper ontology.

What subject represented by akt:full-name was responsible for the mapping in question? How does that translate to other ontologies? Oh, sorry, no place to record that mapping.

Questions:

  1. How do you evaluate the claims of “…web-scale data?” (3-5 pages, citations)
  2. Does creating ad-hoc upper ontologies scale? Yes/No/Why? (3-5 pages, citations)
  3. How does interchanges of ad-hoc uppper ontologies work? (3-5 pages, citations)

Managing Terabytes of Web Semantics Data

Filed under: OWL,RDF,Semantic Web — Patrick Durusau @ 11:00 am

Managing Terabytes of Web Semantics Data Authors: Michele Catasta, Renaud Delbru, Nickolai Toupikov, and Giovanni Tummarello

Abstract:

A large amount of semi structured data is now made available on the Web in form of RDF, RDFa and Microformats. In this chapter, we discuss a general model for the Web of Data and, based on our experience in Sindice.com, we discuss how this is reflected in the architecture and components of a large scale infrastructure. Aspects such as data collection, processing, indexing, ranking are touched, and we give an ample example of an applications built on top of said infrastructure.

Appears as Chapter 6 in R. De Virgilio et al. (eds.), Semantic Web Information Management, © Springer-Verlag Berlin Heidelberg 2010.

Hopefully not too repetitious with the other Sindice.com material I have been posting.

It is a good overview of the area, in addition to specifics about Sindice.com.

Semantic Now?

Filed under: Navigation,OWL,RDF,Semantic Web,Topic Maps — Patrick Durusau @ 10:58 am

Visit Semantic Web, then return here (or use a separate browser window).

I went to the Semantic Web page of the W3C looking for a prior presentation and was struck by the semantic now nature of the page.

It isn’t clear how to access older material.

I have to confess to having only a passing interest in self-promotional, puff pieces, including logos.

I assume that is true for many of the competent researchers working with the W3C. (There are a lot of them, this is not a criticism of their work.)

So, where is the interface that enables quick access to substantial materials, including older standards, statements and presentations?

*****
I understand at least some of the W3C site is described in RDF. What degree of detail, precision, I don’t know. Would make a starting point for a topic map of the site.

The other necessary component and where this page falls down, would be a useful navigation choices. That would be the harder problem.

Let me know if you are interested in cracking this nut.

« Newer PostsOlder Posts »

Powered by WordPress