Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

November 9, 2012

Eleven SPARQL 1.1 Specifications Published

Filed under: RDF,Semantic Web,SPARQL — Patrick Durusau @ 7:00 am

Eleven SPARQL 1.1 Specifications Published

From the post:

The SPARQL Working Group has today published a set of eleven documents, advancing most of SPARQL 1.1 to Proposed Recommendation. Building on the success of SPARQL 1.0, SPARQL 1.1 is a full-featured standard system for working with RDF data, including a query/update language, two HTTP protocols (one full-featured, one using basic HTTP verbs), three result formats, and other features which allow SPARQL endpoints to be combined and work together. Most features of SPARQL 1.1 have already been implemented by a range of SPARQL suppliers, as shown in our table of implementations and test results.

The Proposed Recommendations are:

  1. SPARQL 1.1 Overview – Overview of SPARQL 1.1 and the SPARQL 1.1 documents
  2. SPARQL 1.1 Query Language – A query language for RDF data.
  3. SPARQL 1.1 Update – Specifies additions to the query language to allow clients to update stored data
  4. SPARQL 1.1 Query Results JSON Format – How to use JSON for SPARQL query results
  5. SPARQL 1.1 Query Results CSV and TSV Formats – How to use comma-separated values (CVS) and tab-separated values (TSV) for SPARQL query results
  6. SPARQL Query Results XML Format – How to use XML for SPARQL query results. (This contains only minor, editorial updates from SPARQL 1.0, and is actually a Proposed Edited Recommendation.)
  7. SPARQL 1.1 Federated Query – an extension of the SPARQL 1.1 Query Language for executing queries distributed over different SPARQL endpoints.
  8. SPARQL 1.1 Service Description – a method for discovering and a vocabulary for describing SPARQL services.

While you are waiting for news on SPARQL performance increases, some reading material to pass the time.

November 8, 2012

Federated SPARQL Queries [Take “Hit” From Multiple/Distributed Data Sets]

Filed under: BigData,RDF,SPARQL — Patrick Durusau @ 5:30 pm

On the Impact of Data Distribution in Federated SPARQL Queries by Nur Aini Rakhmawati and Michael Hausenblas.

Abstract:

With the growing number of publicly available SPARQL endpoints, federated queries become more and more attractive and feasible. Compared to queries against a single endpoint, queries that range over a number of endpoints pose new challenges, ranging from the type and number of datasets involved to the data distribution across the datasets. Existingre search focuses on the data distribution in a central store and is mainly concerned with adopting well-known, traditional database techniques. In this work we investigate the impact of the data distribution in the context of federated SPARQL queries.We perform a number of experiments with four federation frameworks (Sesame Alibaba, Splendid, FedX, and Darq) against an RDF dataset, Dailymed, that we partition by graph and class.Our preliminary results confirm the intuition that the more datasets involved in query processing, the worse performance of federation query is and that the data distribution significantly influences the performance.

It isn’t often I read in the same paragraph:

With the growing number of publicly available SPARQL endpoints, federated queries become more and more attractive and feasible.

and

Our preliminary results confirm the intuition that the more datasets involved in query processing, the worse performance of federation query is and that the data distribution significantly influences the performance.

I have trouble reconciling “…more and more attractive and feasible” with “…the more datasets…the worse performance of federation query is….”

Particularly in the age of “big data” where an increasing number of datasets and data distribution are the norms, not exceptions.

I commend the authors for creating data points to confirm “intuitions” about SPARQL performance.

At the same time, their results raise serious questions about SPARQL in big data environments.

October 27, 2012

SPARQL and Big Data (and NoSQL) [Identifying Winners and Losers – Cui Bono?]

Filed under: BigData,NoSQL,SPARQL — Patrick Durusau @ 3:19 pm

SPARQL and Big Data (and NoSQL) by Bob DuCharme.

From the post:

How to pursue the common ground?

I think it’s obvious that SPARQL and other RDF-related technologies have plenty to offer to the overlapping worlds of Big Data and NoSQL, but this doesn’t seem as obvious to people who focus on those areas. For example, the program for this week’s Strata conference makes no mention of RDF or SPARQL. The more I look into it, the more I see that this flexible, standardized data model and query language align very well with what many of those people are trying to do.

But, we semantic web types can’t blame them for not noticing. If you build a better mouse trap, the world won’t necessarily beat a path to your door, because they have to find out about your mouse trap and what it does better. This requires marketing, which requires talking to those people in language that they understand, so I’ve been reading up on Big Data and NoSQL in order to better appreciate what they’re trying to do and how.

A great place to start is the excellent (free!) booklet Planning for Big Data by Edd Dumbill. (Others contributed a few chapters.) For a start, he describes data that “doesn’t fit the strictures of your database architectures” as a good candidate for Big Data approaches. That’s a good start for us. Here are a few longer quotes that I found interesting, starting with these two paragraphs from the section titled “Ingesting and Cleaning” after a discussion about collecting data from multiple different sources (something else that RDF and SPARQL are good at):

Bob has a very good point: marketing “…requires talking to those people in language that they understand….”

That is, no matter how “good” we think a solution may be, it won’t interest others until we explain it in terms they “get.”

But “marketing” requires more than a lingua franca.

Once an offer is made and understood, it must interest the other person. Or it is very poor marketing.

We may think that any sane person would jump at the chance to reduce the time and expense of data cleaning. But that isn’t necessarily the case.

I once made a proposal that would substantially reduce the time and expense for maintaining membership records. Records that spanned decades and were growing every year (hard copy). I made the proposal, thinking it would be well received.

Hardly. I was called into my manager’s office and got a lecture on how the department in question had more staff, a larger budget, etc., than any other department. They had no interest whatsoever in my proposal and that I should not presume to offer further advice. (Years later my suggestion was adopted when budget issues forced the issue.)

Efficient information flow interested me but not management.

Bob and the rest of us need to ask the traditional question: Cui bono? (To whose benefit?)

Semantic technologies, just like any other, have winners and losers.

To effectively market our wares, we need to identify both.

October 21, 2012

Relational Data to RDF [Bridge to No Where?]

Filed under: R2ML,RDF,SPARQL — Patrick Durusau @ 4:13 pm

Transforming Relational Data to RDF – R2RML Becomes Official W3C Recommendation by Eric Franzon.

From the post:

Today, the World Wide Web Consortium announced that R2RML has achieved Recommendation status. As stated on the W3C website, R2RML is “a language for expressing customized mappings from relational databases to RDF datasets. Such mappings provide the ability to view existing relational data in the RDF data model, expressed in a structure and target vocabulary of the mapping author’s choice.” In the life cycle of W3C standards creation, today’s announcement means that the specifications have gone through extensive community review and revision and that R2RML is now considered stable enough for wide-spread distribution in commodity software.

Richard Cyganiak, one of the Recommendation’s editors, explained why R2RML is so important. “In the early days of the Semantic Web effort, we’ve tried to convert the whole world to RDF and OWL. This clearly hasn’t worked. Most data lives in entrenched non-RDF systems, and that’s not likely to change.”

“That’s why technologies that map existing data formats to RDF are so important,” he continued. “R2RML builds a bridge between the vast amounts of existing data that lives in SQL databases and the SPARQL world. Having a standard for this makes SPARQL even more useful than it already is, because it can more easily access lots of valuable existing data. It also means that database-to-RDF middleware implementations can be more easily compared, which will create pressure on both open-source and commercial vendors, and will increase the level of play in the entire field.” (emphasis added)

If most data resides in non-RDF systems, what do I gain by converting it into RDF for querying with SPARQL?

Some possible costs:

  • Planning the conversion from non-RDF to RDF system
  • Debugging the conversion (unless it is trivial, the few conversions won’t be right)
  • Developing the SPARQL queries
  • Debugging the SPARQL queries
  • Updating the conversion if new data is added to the source
  • Testing the SPARQL query against updated data
  • Maintenance of the source and target RDF systems (unless pushing SPARQL is a way to urge conversion from relational system)

Or to put it another way, if most data is still on non-RDF data stores, why do I need a bridge to SPARQL world?

Of is this a Sarah Palin bridge to no where?

July 24, 2012

SPARQL 1.1 Query Language [Last Call – 21 August 2012]

Filed under: RDF,SPARQL — Patrick Durusau @ 7:25 pm

SPARQL 1.1 Query Language

From the W3C News page:

The SPARQL Working Group has published a Last Call Working Draft of SPARQL 1.1 Query Language. RDF is a directed, labeled graph data format for representing information in the Web. This specification defines the syntax and semantics of the SPARQL query language for RDF. SPARQL can be used to express queries across diverse data sources, whether the data is stored natively as RDF or viewed as RDF via middleware. SPARQL contains capabilities for querying required and optional graph patterns along with their conjunctions and disjunctions. SPARQL also supports aggregation, subqueries, negation, creating values by expressions, extensible value testing, and constraining queries by source RDF graph. The results of SPARQL queries can be result sets or RDF graphs. Comments are welcome through 21 August.

July 10, 2012

Linked Media Framework [Semantic Web vs. ROI]

Filed under: Linked Data,RDF,Semantic Web,SKOS,SPARQL — Patrick Durusau @ 11:08 am

Linked Media Framework

From the webpage:

The Linked Media Framework is an easy-to-setup server application that bundles central Semantic Web technologies to offer advanced services. The Linked Media Framework consists of LMF Core and LMF Modules.

LMF Usage Scenarios

The LMF has been designed with a number of typical use cases in mind. We currently support the following tasks out of the box:

Target groups are a in particular casual users who are not experts in Semantic Web technologies but still want to publish or work with Linked Data, e.g. in the Open Government Data and Linked Enterprise Data area.

It is a bad assumption that workers in business or government have free time to add semantics to their data sets.

If adding semantics to your data, by linked data or other means is a core value, resource the task just like any other with your internal staff or hire outside help.

A Semantic Web short coming is the attitude that users are interested in or have the time to build it. Assuming the project to be worthwhile and/or doable.

Users are fully occupied with tasks of their own and don’t need a technical elite tossing more tasks onto them. You want the Semantic Web? Suggest you get on that right away.

Integrated data that meets a business need and has proven ROI isn’t the same thing as the Semantic Web. Give me a call if you are interested in the former, not the latter. (I would do the latter as well, but only on your dime.)

I first saw this at semanticweb.com, announcing version 2.2.0 of lmf – Linked Media Framework.

July 6, 2012

SparQLed…Writing SPARQL Queries [Less ZERO-result queries]

Filed under: RDF,Semantic Web,SPARQL — Patrick Durusau @ 4:36 pm

SindiceTech Releases SparQLed As Open Source Project To Simplify Writing SPARQL Queries by Jennifer Zaino.

From the post:

SindiceTech today released SparQLed, the SindiceTech Assisted SPARQL Editor, as an open source project. SindiceTech, a spinoff company from the DERI Institute, commercializes large-scale, Big Data infrastructures for enterprises dealing with semantic data. It has roots in the semantic web index Sindice, which lets users collect, search, and query semantically marked-up web data (see our story here).

SparQLed also is one of the components of the commercial Sindice Suite for helping large enterprises build private linked data clouds. It is designed to give users all the help they need to write SPARQL queries to extract information from interconnected datasets.

“SPARQL is exciting but it’s difficult to develop and work with,” says Giovanni Tummarello, who led the efforts around the Sindice search and analysis engine and is founder and CEO of SindiceTech.

SparQLed Project page.

Maybe we have become spoiled by search engines that always return results, even bad ones:

With SQL, the advantage lies in having a schema which users can look at and understand how to write a query. RDF, on the other hand, has the advantage of providing great power and freedom, because information in RDF can be interconnected freely. But, Tummarello says, “with RDF there is no schema because there is all sorts of information from everywhere.” Without knowing which properties are available specifically for a certain URI and in what context, users can wind up writing queries that return no results and get frustrated by the constant iterating needed to achieve their ends.

I am not encouraged by a features list that promises:

Less ZERO-result queries

July 1, 2012

Cascading map-side joins over HBase for scalable join processing

Filed under: HBase,Joins,Linked Data,LOD,MapReduce,RDF,SPARQL — Patrick Durusau @ 4:45 pm

Cascading map-side joins over HBase for scalable join processing by Martin Przyjaciel-Zablocki, Alexander Schätzle, Thomas Hornung, Christopher Dorner, and Georg Lausen.

Abstract:

One of the major challenges in large-scale data processing with MapReduce is the smart computation of joins. Since Semantic Web datasets published in RDF have increased rapidly over the last few years, scalable join techniques become an important issue for SPARQL query processing as well. In this paper, we introduce the Map-Side Index Nested Loop Join (MAPSIN join) which combines scalable indexing capabilities of NoSQL storage systems like HBase, that suffer from an insufficient distributed processing layer, with MapReduce, which in turn does not provide appropriate storage structures for efficient large-scale join processing. While retaining the flexibility of commonly used reduce-side joins, we leverage the effectiveness of map-side joins without any changes to the underlying framework. We demonstrate the significant benefits of MAPSIN joins for the processing of SPARQL basic graph patterns on large RDF datasets by an evaluation with the LUBM and SP2Bench benchmarks. For most queries, MAPSIN join based query execution outperforms reduce-side join based execution by an order of magnitude.

Some topic map applications include Linked Data/RDF processing capabilities.

The salient comment here being: “For most queries, MAPSIN join based query execution outperforms reduce-side join based execution by an order of magnitude.

June 6, 2012

Recycling RDF and SPARQL

Filed under: Graphs,RDF,SPARQL — Patrick Durusau @ 7:48 pm

I was surprised to learn the W3C is recycling RDF and SPARQL for graph analytics:

RDF and SPARQL (both standards developed by the World Wide Web Consortium) [were developed] as the industry standard[s] for graph analytics.

It doesn’t hurt to repurpose those standards, assuming they are appropriate for graph analytics.

Or rather, assuming they are appropriate for your graph analytic needs.

BTW, there is a contest to promote recycling of RDF and SPARQL with a $70,000 first prize:

YarcData Announces $100,000 Big Data Graph Analytics Challenge

From the post:

At the 2012 Semantic Technology & Business Conference in San Francisco, YarcData, a Cray company, has announced the planned launch of a “Big Data” contest featuring $100,000 in prizes. The YarcData Graph Analytics Challenge will recognize the best submissions for solutions of un-partitionable, Big Data graph problems.

YarcData is holding the contest to showcase the increasing applicability and adoption of graph analytics in solving Big Data problems. The contest also is intended to promote the use and development of RDF and SPARQL (both standards developed by the World Wide Web Consortium) as the industry standard for graph analytics.

“Graph databases have a significant role to play in analytic environments, and they can solve problems like relationship discovery that other traditional technologies do not handle easily,” said Philip Howard, Research Director, Bloor Research. “YarcData driving thought leadership in this area will be positive for the overall graph database market, and this contest could help expand the use of RDF and SPARQL as valuable tools for solving Big Data problems.”

The grand prize for the first place winner is $70,000. The second place winner will receive $10,000, and the third place winner will receive $5,000. There also will be additional prizes for the other finalists. Contest judges, which will include a combination of Big Data industry analysts, experts from academia and semantic web, and YarcData customers, will review the submissions and select the 10 best contestants.

The YarcData Graph Analytics Challenge will officially begin on Tuesday, June 26, 2012, and winners will be announced during a live Web event on December 4, 2012. Full contest details, including specific criteria and the contest judges, will be announced on June 26. To pre-register for a contest information packet, please visit the YarcData website at www.yarcdata.com. Information packets will be sent out June 26. The contest will be open only to those individuals who are eligible to participate under U.S. and other applicable laws and regulations.

Full details to follow on June 26, 2012.

May 10, 2012

Simple federated queries with RDF [Part 1]

Filed under: Federation,RDF,SPARQL — Patrick Durusau @ 4:12 pm

Simple federated queries with RDF [Part 1]

Bob DuCharme writes:

A few more triples to identify some relationships, and you’re all set.

[side note] Easy aggregation without conversion is where semantic web technology shines the brightest.

Once, at an XML Summer School session, I was giving a talk about semantic web technology to a group that included several presenters from other sessions. This included Henry Thompson, who I’ve known since the SGML days. He was still a bit skeptical about RDF, and said that RDF was in the same situation as XML—that if he and I stored similar information using different vocabularies, we’d still have to convert his to use the same vocabulary as mine or vice versa before we could use our data together. I told him he was wrong—that easy aggregation without conversion is where semantic web technology shines the brightest.

I’ve finally put together an example. Let’s say that I want to query across his address book and my address book together for the first name, last name, and email address of anyone whose email address ends with “.org”. Imagine that his address book uses the vCard vocabulary and the Turtle syntax and looks like this,

Bob is an expert in more areas of markup, SGML/XML, SPARQL and other areas than I can easily count. Not to mention being a good friend.

Take a look at Bob’s post and decide for yourself how “simple” the federation is following Bob’s technique.

I am just going to let it speak for itself today.

I will outline obvious and some not so obvious steps in Bob’s “simple” federated queries in Part II.

February 22, 2012

Meronymy SPARQL Database Server To Debut With Emphasis on High Performance

Filed under: Linked Data,Meronymy,SPARQL — Patrick Durusau @ 4:47 pm

Meronymy SPARQL Database Server To Debut With Emphasis on High Performance

From the post:

Coming in June from start-up Meronymy is a new RDF enterprise database management system, the Meronymy SPARQL Database Server. The company, founded by Inge Henriksen, began life because of the need he saw for a high-performance and more scalable RDF database server.

The idea to focus on a database server exclusively oriented to Linked Data and the Semantic Web came as a result of Henriksen’s work over the last decade as an IT consultant implementing many semantic solutions for customers in sectors such as government and education. “One issue that always came up was performance,” he explains, especially when performing more advanced SPARQL queries against triple stores using filters, for example.

“Once the data reached a certain size, which it often did very quickly, the size of the data became unmanageable and we had to fall back on caching and the like to resolve these performance issues.” The problem there is that caching isn’t compatible with situations where there is a need for real-time data.

A closed beta is due out soon. Register at Meronymy.

January 16, 2012

Introducing Meronymy SPARQL Database Server

Filed under: RDF,Semantic Web,SPARQL — Patrick Durusau @ 2:33 pm

Introducing Meronymy SPARQL Database Server

Inge Henriksen writes:

I am pleased to announce today that the Meronymy SPARQL Database Server is ready for release later in 2012. Meronymy SPARQL Database Server is a high performance RDF Enterprise Database Management System (DBMS).

Our goal has been to make a really fast, ACID, OS portable, user friendly, secure, SPARQL-driven RDF database server usable with most programming languages.

Let’s not start any language wars about Meronymy being written in C++/assembly, 😉 , and concentrate on its performance in actual use.

Suggested RDF data sets to use to test that performance? (Knowing Inge I trust it is fast but the question is how fast under what circumstances?)

Or other RDF engines to test along side of it?

PS: If you don’t know SPARQL, check out Learning SPARQL by Bob Ducharme.

January 13, 2012

Meronymy SPARQL Database Server

Filed under: RDF,SPARQL — Patrick Durusau @ 8:15 pm

Meronymy SPARQL Database Server

Inge Henriksen writes:

We are pleased to announce that the Meronymy SPARQL Database Server is ready for release later in 2012. Those interested in our RDF database server software should consider registering today; those that do get exclusive early access to beta software in the upcoming closed beta testing period, insider news on the development progress, get to submit feature requests, and otherwise directly influence the finished product.

From the FAQ we learn some details:

A: All components in the database server and its drivers have been programmed from scratch so that we could optimize them in terms of their performance.
We developed the database server in C++ since this programming language has the most potential for optimalization, there are also some inline assembly at key locations in the programming code.
Some more components that makes our database management system very fast:

  • In-process query optimizer; determines the most efficient way to execute a query.
  • In-proces memory manager; for much faster memory allocation and deallocation than the operating system can provide.
  • In-process multithreaded HTTP server; for much faster SPARQL Protocol endpoint than through a standard out-of-process web server.
  • In-process multithreaded TCP/IP-listener with thread pooling; for efficient thread managment.
  • In-process directly coded lexical analyzer; for efficient query parsing.
  • Snapshot isolation; for fast transaction processing.
  • B+ trees; for fast indexing
  • In-process stream-oriented XML parser; for fast RDF/XML parsing.
  • A RDF data model; for no data model abstraction layers which results in faster processing of data.

I’m signing up for the beta. How about you?

December 13, 2011

UMBEL Services, Part 1: Overview

Filed under: Ontology,Open Semantic Framework,SPARQL — Patrick Durusau @ 9:52 pm

UMBEL Services, Part 1: Overview

From the post:

UMBEL, the Upper Mapping and Binding Exchange Layer, is an upper ontology of about 28,000 reference concepts and a vocabulary designed for domain ontologies and ontology mapping [1]. When we first released UMBEL in mid-2008 it was accompanied by a number of Web services and a SPARQL endpoint, and general APIs. In fact, these were the first Web services developed for release by Structured Dynamics. They were the prototypes for what later became the structWSF Web services framework, which incorporated many lessons learned and better practices.

By the time that the structWSF framework had evolved with many additions to comprise the Open Semantic Framework (OSF), those original UMBEL Web services had become quite dated. Thus, upon the last major update to UMBEL to version 1.0 back in February of this year, we removed these dated services.

Like what I earlier mentioned about the cobbler’s children being the last to get new shoes, it has taken us a bit to upgrade the UMBEL services. However, I am pleased to announce we have now completed the transition of UMBEL’s earlier services to use the OSF framework, and specifically the structWSF platform-independent services. As a result, there are both upgraded existing services and some exciting new ones. We will now be using UMBEL as one of our showcases for these expanding OSF features. We will be elaborating upon these features throughout this series, some parts of which will appear on Fred Giasson’s blog.

In this first part, we provide a broad overview of the new UMBEL OSF implementation. We also begin to foretell some of the parts to come that will describe some of these features in more detail.

There are three more parts that follow this one.

If you have the time, I am interested in your take on this resource.

A lot of time and effort has gone into making this a useful site, so what parts do you like best/least? What would you change?

More to follow on this one.

December 9, 2011

British Museum Semantic Web Collection Online

Filed under: British Museum,Linked Data,SPARQL — Patrick Durusau @ 8:24 pm

British Museum Semantic Web Collection Online

From the webpage:

Welcome to this Linked Data and SPARQL service. It provides access to the same collection data available through the Museum’s web presented Collection Online, but in a computer readable format. The use of the W3C open data standard, RDF, allows the Museum’s collection data to join and relate to a growing body of linked data published by other organisations around the world interested in promoting accessibility and collaboration.

The data has also been organised using the CIDOC-CRM (Conceptual Reference Model) crucial for harmonising with other cultural heritage data. The current version is beta and development work continues to improve the service. We hope that the service will be used by the community to develop friendly web applications that are freely available to the community.

Please use the SPARQL menu item to use the SPARQL user interface or click here.

With the British National Bibliography, the British Museum both accessible via SPARQL and Bob DuCharme’s Learning SPARQL book, the excuses for not knowing SPARQL cold are few and far in between.

November 25, 2011

SPARQL 1.1 Overview

Filed under: SPARQL — Patrick Durusau @ 4:23 pm

SPARQL 1.1 Overview

From the webpage:

Abstract:

This document is an overview of SPARQL 1.1. It provides an introduction to a set of W3C specifications that facilitate querying and manipulating RDF graph content on the Web or in an RDF store. (First Public Working draft)

Not a deep introduction but does include enough pointers and other material that it is worth reading.

November 17, 2011

Last Call Working Draft of SPARQL 1.1 Federated Query

Filed under: Federated Search,Query Language,SPARQL — Patrick Durusau @ 8:39 pm

Last Call Working Draft of SPARQL 1.1 Federated Query

From the W3C:

A Last Call Working Draft of SPARQL 1.1 Federated Query, which offers data consumers an opportunity to merge data distributed across the Web from multiple SPARQL query services. Comments on this working draft are welcome before 31 December 2011.

Some “lite” holiday reading. 😉

October 14, 2011

MongoGraph – MongoDB Meets the Semantic Web

Filed under: MongoDB,RDF,Semantic Web,SPARQL — Patrick Durusau @ 6:24 pm

MongoGraph – MongoDB Meets the Semantic Web

From the post (Franz Inc.):

Recorded Webcast: MongoGraph – MongoDB Meets the Semantic Web From October 12, 2011

MongoGraph is an effort to bring the Semantic Web to MongoDB developers. We implemented a MongoDB interface to AllegroGraph to give Javascript programmers both Joins and the Semantic Web. JSON objects are automatically translated into triples and both the MongoDB query language and SPARQL work against your objects.

Join us for this webcast to learn more about working on the level of objects instead of individual triples, where an object would be defined as all the triples with the same subject. We’ll discuss the simplicity of the MongoDB interface for working with objects and all the properties of an advanced triplestore, in this case joins through SPARQL queries, automatic indexing of all attributes/values, ACID properties all packaged to deliver a simple entry into the world of the Semantic Web.

I haven’t watched the video, yet, but:

working on the level of objects instead of individual triples, where an object would be defined as all the triples with the same subject.

certainly caught my eye.

Curious, if this means simply using the triples as sources of values and not “reasoning” with them?

October 4, 2011

SILK – Link Discovery Framework Version 2.5 released

Filed under: Linked Data,LOD,RDF,Semantic Web,SPARQL — Patrick Durusau @ 7:54 pm

SILK – Link Discovery Framework Version 2.5 released

I was quite excited to see under “New Data Transformations”…”Merge Values of different inputs.”

But the documentation for Transformation must be lagging behind or I have a different understanding of what it means to “Merge Values of different inputs.”

Perhaps I should ask: What does SILK mean by “Merge Values of different inputs?”

Picking out an issue that is of particular interest to me is not meant to be a negative comment on the project. An impressive bit of work for any EU funded project.

Another question: Has anyone looked at the SILK- Link Specification Language (SILK-LSL) as an input into declaring equivalence/processing for arbitrary data objects? Just curious.

Robert Isele posted this announcement about SILK on October 3, 2011:

we are happy to announce version 2.5 of the Silk Link Discovery Framework for the Web of Data.

The Silk framework is a tool for discovering relationships between data items within different Linked Data sources. Data publishers can use Silk to set RDF links from their data sources to other data sources on the Web. Using the declarative Silk – Link Specification Language (Silk-LSL), developers can specify the linkage rules data items must fulfill in order to be interlinked. These linkage rules may combine various similarity metrics and can take the graph around a data item into account, which is addressed using an RDF path language.

Linkage rules can either be written manually or developed using the Silk Workbench. The Silk Workbench, is a web application which guides the user through the process of interlinking different data sources.

Version 2.5 includes the following additions to the last major release 2.4:

(1) Silk Workbench now includes a function to learn linkage rules from the reference links. The learning function is based on genetic programming and capable of learning complex linkage rules. Similar to a genetic algorithm, genetic programming starts with a randomly created population of linkage rules. From that starting point, the algorithm iteratively transforms the population into a population with better linkage rules by applying a number of genetic operators. As soon as either a linkage rule with a full f-Measure has been found or a specified maximum number of iterations is reached, the algorithm stops and the user can select a linkage rule.

(2) A new sampling tab allows for fast creation of the reference link set. It can be used to bootstrap the learning algorithm by generating a number of links which are then rated by the user either as correct or incorrect. In this way positive and negative reference links are defined which in turn can be used to learn a linkage rule. If a previous learning run has already been executed, the sampling tries to generate links which contain features which are not yet covered by the current reference link set.

(2) The new help sidebar provides the user with a general description of the current tab as well as with suggestions for the next steps in the linking process. As new users are usually not familiar with the steps involved in interlinking two data sources, the help sidebar currently provides basic guidance to the user and will be extended in future versions.

(3) Introducing per-comparison thresholds:

  • On popular request, thresholds can now be specified on each comparison.
  • Backwards-compatible: Link specifications using a global threshold can still be executed.

(4) New distance measures:

  • Jaccard Similarity
  • Dice’s coefficient
  • DateTime Similarity
  • Tokenwise Similarity, contributed by Florian Kleedorfer, Research Studios Austria

(5) New data transformations:

  • RemoveEmptyValues
  • Tokenizer
  • Merge Values of multiple inputs

(6) New DataSources and Outputs

  • In addition to reading from SPARQL endpoints, Silk now also supports reading from RDF dumps in all common formats. Currently the data set is held in memory and it is not available in the Workbench yet, but future versions will improve this.
  • New SPARQL/Update Output: In addition to writing the links to a file, Silk now also supports writing directly to a triple store using SPARQL/Update.

(7) Various improvements and bugfixes

———————————————————————————

More information about the Silk Link Discovery Framework is available at:

http://www4.wiwiss.fu-berlin.de/bizer/silk/

The Silk framework is provided under the terms of the Apache License, Version 2.0 and can be downloaded from:

http://www4.wiwiss.fu-berlin.de/bizer/silk/releases/

The development of Silk was supported by Vulcan Inc. as part of its Project Halo (www.projecthalo.com) and by the EU FP7 project LOD2-Creating Knowledge out of Interlinked Data (http://lod2.eu/, Ref. No. 257943).

Thanks to Christian Becker, Michal Murawicki and Andrea Matteini for contributing to the Silk Workbench.

September 21, 2011

Dydra

Filed under: Dydra,RDF,SPARQL — Patrick Durusau @ 7:07 pm

Dydra

From What is Dydra?:

Dydra

Dydra is a cloud-based graph database. Whether you’re using existing social network APIs or want to build your own, Dydra treats your customers’ social graph as exactly that.

With Dydra, your data is natively stored as a property graph, directly representing the relationships in the underlying data.

Expressive

With Dydra, you access and update your data via an industry-standard query language specifically designed for graph processing, SPARQL. It’s easy to use and we provide a handy in-browser query editor to help you learn.

From the QuickStart

Dydra is an RDF store meant to be quick and easy for developers. Getting started quickly will require already being familiar with RDF and SPARQL

OK, so yes a “graph database,” but in the sense of being an RDF store.

Under What is RDF? -> Overview, the site authors say:

The use of URIs allows multiple data sources to talk about the same entities using the same language.

Really? That must mean all the 303 stuff that no less than Tim Berners-Lee and others have been talking about is unnecessary. I understand that several years ago that was the W3C “position,” but leaving aside all my ranting, it isn’t quite the current position.

There is a fundamental ambiguity when an address is used as an identifier. Does it identify what you find at the location it specifies or is it simply an identifier and what is at the location is additional information about what the address has identified?

The prose is out of date or the authors have a seriously dated view of RDF. Either way, it doesn’t inspire a lot of confidence.

.

September 11, 2011

New Challenges in Distributed Information Filtering and Retrieval

New Challenges in Distributed Information Filtering and Retrieval

Proceedings of the 5th International Workshop on New Challenges in Distributed Information Filtering and Retrieval
Palermo, Italy, September 17, 2011.

Edited by:

Cristian Lai – CRS4, Loc. Piscina Manna, Building 1 – 09010 Pula (CA), Italy

Giovanni Semeraro – Dept. of Computer Science, University of Bari, Aldo Moro, Via E. Orabona, 4, 70125 Bari, Italy

Eloisa Vargiu – Dept. of Electrical and Electronic Engineering, University of Cagliari, Piazza d’Armi, 09123 Cagliari, Italy

Table of Contents:

  1. Experimenting Text Summarization on Multimodal Aggregation
    Giuliano Armano, Alessandro Giuliani, Alberto Messina, Maurizio Montagnuolo, Eloisa Vargiu
  2. From Tags to Emotions: Ontology-driven Sentimental Analysis in the Social Semantic Web
    Matteo Baldoni, Cristina Baroglio, Viviana Patti, Paolo Rena
  3. A Multi-Agent Decision Support System for Dynamic Supply Chain Organization
    Luca Greco, Liliana Lo Presti, Agnese Augello, Giuseppe Lo Re, Marco La Cascia, Salvatore Gaglio
  4. A Formalism for Temporal Annotation and Reasoning of Complex Events in Natural Language
    Francesco Mele, Antonio Sorgente
  5. Interaction Mining: the new Frontier of Call Center Analytics
    Vincenzo Pallotta, Rodolfo Delmonte, Lammert Vrieling, David Walker
  6. Context-Aware Recommender Systems: A Comparison Of Three Approaches
    Umberto Panniello, Michele Gorgoglione
  7. A Multi-Agent System for Information Semantic Sharing
    Agostino Poggi, Michele Tomaiuolo
  8. Temporal characterization of the requests to Wikipedia
    Antonio J. Reinoso, Jesus M. Gonzalez-Barahona, Rocio Muñoz-Mansilla, Israel Herraiz
  9. From Logical Forms to SPARQL Query with GETARUN
    Rocco Tripodi, Rodolfo Delmonte
  10. ImageHunter: a Novel Tool for Relevance Feedback in Content Based Image Retrieval
    Roberto Tronci, Gabriele Murgia, Maurizio Pili, Luca Piras, Giorgio Giacinto

September 2, 2011

Improving the recall of decentralised linked data querying through implicit knowledge

Filed under: Linked Data,LOD,SPARQL — Patrick Durusau @ 8:02 pm

Improving the recall of decentralised linked data querying through implicit knowledge by Jürgen Umbrich, Aidan Hogan, Axel and Polleres.

Abstract:

Aside from crawling, indexing, and querying RDF data centrally, Linked Data principles allow for processing SPARQL queries on-the-fly by dereferencing URIs. Proposed link-traversal query approaches for Linked Data have the benefits of up-to-date results and decentralised (i.e., client-side) execution, but operate on incomplete knowledge available in dereferenced documents, thus affecting recall. In this paper, we investigate how implicit knowledge – specifically that found through owl:sameAs and RDFS reasoning – can improve the recall in this setting. We start with an empirical analysis of a large crawl featuring 4 m Linked Data sources and 1.1 g quadruples: we (1) measure expected recall by only considering dereferenceable information, (2) measure the improvement in recall given by considering rdfs:seeAlso links as previous proposals did. We further propose and measure the impact of additionally considering (3) owl:sameAs links, and (4) applying lightweight RDFS reasoning (specifically {\rho}DF) for finding more results, relying on static schema information. We evaluate our methods for live queries over our crawl.

From the document:

owl:sameAs links are used to expand the set of query relevant sources, and owl:sameAs rules are used to materialise implicit knowledge given by the OWL semantics, potentially generating additional answers.

I have always thought that knowing the “why” an owl:sameAs would make it more powerful. But since any basis for subject sameness can be used, that may not be the case.

August 24, 2011

Sesame 2.5.0 Release

Filed under: RDF,Sesame,SPARQL — Patrick Durusau @ 7:00 pm

Sesame 2.5.0 Release

From the webpage:

  • SPARQL 1.1 Query Language support
    Sesame 2.5 features near-complete support for the
    SPARQL 1.1 Query Language Last Call Working Draft ,
    including all new builtin functions and operators, improved aggregation behavior and more.
  • SPARQL 1.1 Update support
    Sesame 2.5 has full support for the new SPARQL 1.1 Update Working Draft. The Repository API has been extended to support creation of SPARQL Update operations, the SAIL API has been extended to allow Update operations to be passed directly to the underlying backend implementation for optimized execution. Also, the Sesame Workbench application has been extended to allow easy execution of SPARQL update operations on your repositories.
  • SPARQL 1.1 Protocol support
    Sesame 2.5 fully supports the SPARQL 1.1 Protocol for RDF Working Draft. The Sesame REST protocol has been extended to allow update operations via SPARQL on repositories. A Sesame server therefore now automatically publishes any repository as a fully compliant SPARQL endpoint.
  • Binary RDF support
    Sesame 2.5 includes a new binary RDF serialization format. This format has been derived from the existing binary tuple results format. It’s main features are reduced parsing overhead and minimal memory requirements (for handling really long literals, a.o.t.).

July 27, 2011

Learning SPARQL

Filed under: RDF,Semantic Web,SPARQL — Patrick Durusau @ 8:35 am

Learning SPARQL by Bob DuCharme.

From the author’s announcement (email):

It’s the only complete book on the W3C standard query language for linked data and the semantic web, and as far as I know the only book at all that covers the full range of SPARQL 1.1 features such as the ability to update data. The book steps you through simple examples that can all be performed with free software, and all sample queries, data, and output are available on the book’s website.

In the words of one reviewer, “It’s excellent—very well organized and written, a completely painless read. I not only feel like I understand SPARQL now, but I have a much better idea why RDF is useful (I was a little skeptical before!)” I’d like to thank everyone who helped in the review process and everyone who offered to help, especially those in the Charlottesville/UVa tech community.

You can follow news about the book and about SPARQL on Twitter at @learningsparql.

Remembering Bob’s “SGML CD,” I ordered a copy (electronic and print) of “Learning SPARQL” as soon as I saw the announcement in my inbox.

More comments to follow.

July 22, 2011

You Too Can Use Hadoop Inefficiently!!!

Filed under: Algorithms,Graphs,Hadoop,RDF,SPARQL — Patrick Durusau @ 6:15 pm

The headline Hadoop’s tremendous inefficiency on graph data management (and how to avoid it) certainly got my attention.

But when you read the paper, Scalable SPARQL Querying of Large RDF Graphs, it isn’t Hadoop’s “tremendous inefficiency,” but actually that of SHARD, an RDF triple store that uses flat text files for storage.

Or as the authors say in their paper (6.3 Performance Comparison):

Figure 6 shows the execution time for LUBM in the four benchmarked systems. Except for query 6, all queries take more time on SHARD than on the single-machine deployment of RDF-3X. This is because SHARD’s use of hash partitioning only allows it optimize subject-subject joins. Every other type of join requires a complete redistribution of data over the network within a Hadoop job, which is extremely expensive. Furthermore, its storage layer is not at all optimized for RDF data (it stores data in flat files).

Saying that SHARD (not as well known as Hadoop), was using Hadoop inefficiently, would not have the “draw” of allegations about Hadoop’s failure to process graph data efficiently.

Sure, I write blog lines for “draw” but let’s ‘fess up in the body of the blog article. Readers shouldn’t have to run down other sources to find the real facts.

July 4, 2011

Translating SPARQL queries into SQL using R2RML

Filed under: R2RML,SPARQL,SQL,TMQL — Patrick Durusau @ 6:04 pm

Translating SPARQL queries into SQL using R2RML

From the post:

The efficient translation of SPARQL into SQL is an active field of research in the academy and in the industry. In fact, a number of triple stores are built as a layer on top of a relational solution. Support for SPARQL in these RDF stores supposes the translation of the SPARQL query to a SQL query that can be executed in a certain relational schema.

Some foundational papers in the field include “A Relational Algebra for SPARQL” by Richard Cyganiak that translates the semantics of SPARQL as they were finally defined by the W3C to the Relational Algebra semantics or “Semantics preserving SPARQL-to-SQL translation” by Chebotko, Lu and Fotohui, that introduces an algorithm to translate SPARQL queries to SQL queries.

This latter paper is specially interesting because the translation mechanism is parametric on the underlying relational schema. This makes possible to adapt their translation mechanism to any relational database using a couple of mapping functions, alpha and beta, that map a triple pattern of the SPARQL query and a triple pattern and a position in the triple to a table and a column in the database.

Provided that R2RML offers a generic mechanism for the description of relational databases, in order to support SPARQL queries in any R2RML RDF graph, we just need to find an algorithm that receives as an input the R2RML mapping and builds the mapping functions required by Chebotko et alter algorithm.

The straightest way to accomplished that is using the R2RML mapping to generate a virtual table with a single relation with only subject, predicate and object. The mapping for this table is trivial. A possible implementation of this algorithm can be found in the following Clojure code. (I added links to the Cyganiak and Chebotko papers.)

I recommend this post, as well as the Cyganiak and Chebotko papers to anyone interested in TMQL as background reading. Other suggestions?

May 24, 2011

Cassa

Filed under: RDF,SPARQL,Topic Maps — Patrick Durusau @ 10:25 am

Cassa

From the webpage:

A SPARQL 1.1 Graph Store HTTP Protocol [1] implementation for RDF and Topic Maps.

[1] SPARQL 1.1 Graph Store HTTP Protocol

The somewhat longer announcement on topicmapmail, SPARQL 1.1 Graph Store HTTP Protocol for Topic Maps:

Last week discovered the SPARQL 1.1 Graph Store HTTP Protocol [1] and I wondered if this wouldn’t be a good alternative to SDShare [2].

The graph store protocol uses no artificial technologies like Atom but uses REST and RDF consequently. The service uses an ontology [3] to inform the client about available graphs etc.

The protocol allows creation of graphs, deletion of graphs and updating graphs and discovery of graphs (through the service description).

The protocol is rather generic, so it’s usable for Topic Maps as well (graph == topic map).

The protocol provides no fragments/snapshots like SDShare, though. Adding these functionality to the protocol would be interesting, I’d think. I.e. each graph update would trigger a new fragment. Maybe this functionality would also solve the “push problem” [4] without inventing yet another syntax. The description of the available fragments should also be done with an ontology and not solely with Atom, though.

Anyway, I wanted to mention it as a good, *dogfooding* protocol which could be used for Topic Maps.

I created an implementation (Cassa) of the protocol at [5] (no release yet). The implementation supports Topic Maps and RDF but it doesn’t provide the service description yet. And I didn’t translate the service description ontology to Topic Maps yet.

[1] <http://www.w3.org/TR/2011/WD-sparql11-http-rdf-update-20110512/>
[2] <http://www.egovpt.org/fg/CWA_Part_1b>
[3] <http://www.w3.org/TR/2011/WD-sparql11-service-description-20110512/>
[4] <http://www.infoloom.com/pipermail/topicmapmail/2010q4/008761.html>
[5] <https://github.com/heuer/cassa>

May 18, 2011

Balisage 2011 Preliminary Program

Filed under: Conferences,Data Mining,RDF,SPARQL,XPath,XQuery,XSLT — Patrick Durusau @ 6:40 pm

At-A-Glance

Program (in full)

From the announcement (Tommie Usdin):

Topics this year include:

  • multi-ended hypertext links
  • optimizing XSLT and XQuery processing
  • interchange, interoperability, and packaging of XML documents
  • eBooks and epub
  • overlapping markup and related topics
  • visualization
  • encryption
  • data mining

The acronyms this year include:

XML XSLT XQuery XDML REST XForms JSON OSIS XTemp RDF SPARQL XPath

New this year will be:

Lightning talks: an opportunity for participants to say what they think, simply, clearly, and persuasively.

As I have said before, simply the best conference of the year!

Conference site: http://www.balisage.net/

Registration: http://www.balisage.net/registration.html

May 13, 2011

SPARQL 1.1 Drafts – Last Call

Filed under: Query Language,RDF,SPARQL — Patrick Durusau @ 7:19 pm

SPARQL 1.1 Drafts – Last Call

From the W3C News:

May 5, 2011

SPARQL by Example
(with Cheatsheet)

Filed under: Query Language,SPARQL — Patrick Durusau @ 1:45 pm

SPARQL by Example

SPARQL by Example: The Cheatsheet

Good introductory materials.

Recall that MaJorToM and Maiana both support SPARQL queries.

« Newer PostsOlder Posts »

Powered by WordPress