Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

August 27, 2012

Reasoning with the Variation Ontology using Apache Jena #OWL #RDF

Filed under: Bioinformatics,Jena,OWL,RDF,Reasoning — Patrick Durusau @ 1:46 pm

Reasoning with the Variation Ontology using Apache Jena #OWL #RDF by Pierre Lindenbaum.

From the post:

The Variation Ontology (VariO), “is an ontology for standardized, systematic description of effects, consequences and mechanisms of variations”.

In this post I will use the Apache Jena library for RDF to load this ontology. It will then be used to extract a set of variations that are a sub-class of a given class of Variation.

If you are interested in this example, you may also be interested in the Variation Ontology.

The VariO homepage reports:

VariO allows

  • consistent naming
  • annotation of variation effects
  • data integration
  • comparison of variations and datasets
  • statistical studies
  • development of sofware tools

It isn’t clear on a quick read, how VariO accomplishes:

  • data integration
  • comparison of variations and datasets

Unless it means uniform recordation using VariO enables “data integration,” and “comparison of variations and datasets?”

True but what nomenclature, uniformly used, does not enable “data integration,” and “comparison of variations and datasets?”

Is there one?

August 21, 2012

Putting WorldCat Data Into A Triple Store

Filed under: Library,Linked Data,RDF,WorldCat — Patrick Durusau @ 10:32 am

Putting WorldCat Data Into A Triple Store by Richard Wallis.

From the post:

I can not really get away with making a statement like “Better still, download and install a triplestore [such as 4Store], load up the approximately 80 million triples and practice some SPARQL on them” and then not following it up.

I made it in my previous post Get Yourself a Linked Data Piece of WorldCat to Play With in which I was highlighting the release of a download file containing RDF descriptions of the 1.2 million most highly held resources in WorldCat.org – to make the cut, a resource had to be held by more than 250 libraries.

So here for those that are interested is a step by step description of what I did to follow my own encouragement to load up the triples and start playing.

Have you loaded the WorldCat linked data into a triple store?

Some other storage mechanism?

August 16, 2012

Get Yourself a Linked Data Piece of WorldCat to Play With

Filed under: Library,RDF,WorldCat — Patrick Durusau @ 7:26 pm

Get Yourself a Linked Data Piece of WorldCat to Play With by Richard Wallis.

From the post:

You may remember my frustration a couple of months ago, at being in the air when OCLC announced the addition of Schema.org marked up Linked Data to all resources in WorldCat.org. Those of you who attended the OCLC Linked Data Round Table at IFLA 2012 in Helsinki yesterday, will know that I got my own back on the folks who publish the press releases at OCLC, by announcing the next WorldCat step along the Linked Data road whilst they were still in bed.

The Round Table was an excellent very interactive session with Neil Wilson from the British Library, Emmanuelle Bermes from Centre Pompidou, and Martin Malmsten of the Nation Library of Sweden, which I will cover elsewhere. For now, you will find my presentation Library Linked Data Progress on my SlideShare site.

After we experimentally added RDFa embedded linked data, using Schema.org markup and some proposed Library extensions, to WorldCat pages, one the most often questions I was asked was where can I get my hands on some of this raw data?

We are taking the application of linked data to WorldCat one step at a time so that we can learn from how people use and comment on it. So at that time if you wanted to see the raw data the only way was to use a tool [such as the W3C RDFA 1.1 Distiller] to parse the data out of the pages, just as the search engines do.

So I am really pleased to announce that you can now download a significant chunk of that data as RDF triples. Especially in experimental form, providing the whole lot as a download would have bit of a challenge, even just in disk space and bandwidth terms. So which chunk to choose was a question. We could have chosen a random selection, but decided instead to pick the most popular, in terms of holdings, resources in WorldCat – an interesting selection in it’s own right.

To make the cut, a resource had to be held by more than 250 libraries. It turns out that almost 1.2 million fall in to this category, so a sizeable chunk indeed. To get your hands on this data, download the 1Gb gzipped file. It is in RDF n-triples form, so you can take a look at the raw data in the file itself. Better still, download and install a triplestore [such as 4Store], load up the approximately 80 million triples and practice some SPARQL on them.

That’s a nice sized collection of data. In any format.

But next to last sentence of the post reads:

As I say in the press release, posted after my announcement, we are really interested to see what people will do with this data.

Déjà vu?

I think I have heard that question asked with other linked data releases. You? Pointers?

I first saw this at SemanticWeb.com.

August 14, 2012

A Direct Mapping of Relational Data to RDF

Filed under: RDB,RDF — Patrick Durusau @ 3:36 pm

A Direct Mapping of Relational Data to RDF from the RDB2RDF Working Group.

From the news:

The need to share data with collaborators motivates custodians and users of relational databases (RDB) to expose relational data on the Web of Data. This document defines a direct mapping from relational data to RDF. This definition provides extension points for refinements within and outside of this document. Comments are welcome through 15 September. (emphasis added)

Comments to: public-rdb2rdf-comments@w3.org.

Subscribe (prior to commenting).

R2RML: RDB to RDF Mapping Language

Filed under: Analytics,BigData,R2RML,RDB,RDF — Patrick Durusau @ 3:29 pm

R2RML: RDB to RDF Mapping Language from the RDB2RDF Working Group.

From the news:

This document describes R2RML, a language for expressing customized mappings from relational databases to RDF datasets. Such mappings provide the ability to view existing relational data in the RDF data model, expressed in a structure and target vocabulary of the mapping author’s choice. R2RML mappings are themselves RDF graphs and written down in Turtle syntax. R2RML enables different types of mapping implementations. Processors could, for example, offer a virtual SPARQL endpoint over the mapped relational data, or generate RDF dumps, or offer a Linked Data interface. Comments are welcome through 15 September. (emphasis added)

Subscribe (prior to commenting).

Comments to: public-rdb2rdf-comments@w3.org.

July 29, 2012

Linked Data: Esperanto for APIs?

Filed under: Linked Data,RDF,Semantic Web — Patrick Durusau @ 9:40 am

Michael Vizard writes in: Linked Data to Take Programmable Web to a Higher Level:

The whole concept of a programmable Web may just be too important to rely solely on APIs. That’s the thinking behind a Linked Data Working Group initiative led by the W3C that expects to create a standard for embedding URLs directly within application code to more naturally integrate applications. Backed by vendors such as IBM and EMC, the core idea is to create more reliable method for integrating applications that more easily scales by not creating unnecessary dependencies of APIs and middleware.

At the moment most of the hopes for a truly programmable Web are tied to an API model that is inherently flawed. That doesn’t necessarily mean that Linked Data approaches will eliminate the need for APIs. But in terms of making the Web a programmable resource, Linked Data represents a significant advance in terms of both simplifying the process of actually integrating data while simultaneously reducing dependencies on cumbersome middleware technologies that are expensive to deploy and manage.

Conceptually, linked data is obvious idea. But getting everybody to agree on an actual standard is another matter. At the very least, however, a generally accepted approach to linking data within applications that make the whole programmable Web concept more accessible to developers of almost every skill level should not be all that far off from here. (emphasis added)

I am often critical of Linked Data efforts so let’s be clear:

Linked Data, as a semantic identification method, has strengths and weaknesses, just like any other semantic identification method. If it works for your particular application, great!

One of my objections to Linked Data is its near religious promotion as a remedy for semantic diversity. I don’t think a remedy for semantic diversity is possible, nor is is desirable.

The semantic diversity in IT is like the genetic diversity in the plant and animal kingdoms. It is responsible for robustness and innovation.

Not the fault of Linked Data but it is often paired with explanations for the failure of the Semantic Web to thrive.

The first Scientific American “puff piece” on the semantic was more than a decade ago now. We suddenly learn that it hasn’t been a failure of user interest, adoption, etc., that have defeated the Semantic Web, but a flawed web API model. Cure that and semantic nirvana is just around the corner.

The Semantic Web has failed to thrive because the forces of semantic diversity are more powerful than any effort at semantic sameness.

The history of natural languages and near daily appearance of new programming languages, to say nothing of the changing semantics of both, are evidence for “forces of semantic diversity.”

To paraphrase Johnny Cash, “do we kick against the pricks (semantic diversity)” or build systems that take it into account?

July 24, 2012

SPARQL 1.1 Query Language [Last Call – 21 August 2012]

Filed under: RDF,SPARQL — Patrick Durusau @ 7:25 pm

SPARQL 1.1 Query Language

From the W3C News page:

The SPARQL Working Group has published a Last Call Working Draft of SPARQL 1.1 Query Language. RDF is a directed, labeled graph data format for representing information in the Web. This specification defines the syntax and semantics of the SPARQL query language for RDF. SPARQL can be used to express queries across diverse data sources, whether the data is stored natively as RDF or viewed as RDF via middleware. SPARQL contains capabilities for querying required and optional graph patterns along with their conjunctions and disjunctions. SPARQL also supports aggregation, subqueries, negation, creating values by expressions, extensible value testing, and constraining queries by source RDF graph. The results of SPARQL queries can be result sets or RDF graphs. Comments are welcome through 21 August.

July 23, 2012

Dilbert schematics

Filed under: Associations,RDF,Scope,Topic Maps — Patrick Durusau @ 2:17 pm

Dilbert schematics

In November of 2011, Dan Brickley writes:

How can we package, manage, mix and merge graph datasets that come from different contexts, without getting our data into a terrible mess?

During the last W3C RDF Working Group meeting, we were discussing approaches to packaging up ‘graphs’ of data into useful chunks that can be organized and combined. A related question, one always lurking in the background, was also discussed: how do we deal with data that goes out of date? Sometimes it is better to talk about events rather than changeable characteristics of something. So you might know my date of birth, and that is useful forever; with a bit of math and knowledge of today’s date, you can figure out my current age, whenever needed. So ‘date of birth’ on this measure has an attractive characteristic that isn’t shared by ‘age in years’.

At any point in time, I have at most one ‘age in years’ property; however, you can take two descriptions of me that were at some time true, and merge them to form a messy, self-contradictory description. With this in mind, how far should we be advocating that people model using time-invariant idioms, versus working on better packaging for our data so it is clearer when it was supposed to be true, or which parts might be more volatile?

Interesting to read as an issue for RDF modeling.

Not difficult to solve using scopes on associations in a topic map.

Question: What difficulties do time-invariant idioms introduce for modeling? What difficulties do non-time-invariant idioms introduce for processing?*

Different concerns and it isn’t enough to have an answer to a modeling issue without understanding the implications of the answer.

*Hint: As I read the post, it assumes a shared, “objective” notion of time. Perhaps works for the cartoon world, but what about elsewhere?

July 22, 2012

Stardog Quick Start

Filed under: RDF,Stardog — Patrick Durusau @ 5:44 pm

After downloading Stardog and a license key, I turned to the “Stardog Quick Start” page at: {install directory}/stardog-1.0.2/docs/manual/quick-start/index.html.

What follows is an annotated version of that page that reports “dumb” and perhaps not so “dumb” mistakes I made to get the server running and the first database loaded. (I have also included what output to expect if you are successful at each step.)

First, tell Stardog where its home directory (where databases and other files will be stored) is:If you’re using some weird Unix shell that doesn’t create environment variables in this way, adjust accordingly. Stardog requires STARDOG_HOME to be defined.

$ export STARDOG_HOME=/data/stardog

I edited my .bashrc file to insert this statement.

Be sure to remember to open another shell so that the variable gets set for that window. (Yes, I forgot the first time around.)

Second, copy the stardog-license-key.binYou’ll get this either with an evaluation copy of Stardog or with a licensed copy. into place:

$ cp stardog-license-key.bin $STARDOG_HOME

Of course stardog-license-key.bin has to be readable by the Stardog process.

Mine defaulted to 664 but it is a good idea to check.

Third, start the Stardog server. By default the server will expose SNARL and HTTP interfaces—on ports 5820 and 5822, respectively.

$ ./stardog-admin server start

If successful, you will see:

Starting Stardog server in background, see /home/patrick/working/stardog-1.0.2/stardog.log for more information.

************************************************************
This copy of Stardog is licensed to Patrick Durusau (patrick@durusau.net), Patrick Durusau
This is a Community license
This license does not expire.
************************************************************

                                                             :;   
                                      ;;                   `;`:   
  `'+',    ::                        `++                    `;:`  
 +###++,  ,#+                        `++                    .     
 ##+.,',  '#+                         ++                     +    
,##      ####++  ####+:   ##,++` .###+++   .####+    ####++++#    
`##+     ####+'  ##+#++   ###++``###'+++  `###'+++  ###`,++,:     
 ####+    ##+        ++.  ##:   ###  `++  ###  `++` ##`  ++:      
  ###++,  ##+        ++,  ##`   ##;  `++  ##:   ++; ##,  ++:      
    ;+++  ##+    ####++,  ##`   ##:  `++  ##:   ++' ;##'#++       
     ;++  ##+   ###  ++,  ##`   ##'  `++  ##;   ++:  ####+        
,.   +++  ##+   ##:  ++,  ##`   ###  `++  ###  .++  '#;           
,####++'  +##++ ###+#+++` ##`   :####+++  `####++'  ;####++`      
`####+;    ##++  ###+,++` ##`    ;###:++   `###+;   `###++++      
                                                    ##   `++      
                                                   .##   ;++      
                                                    #####++`      
                                                     `;;;.        

************************************************************
Stardog server 1.0.2 started on Sun Jul 22 16:54:01 EDT 2012.

SNARL server running on snarl://localhost:5820/
HTTP server running on http://localhost:5822/.
Stardog documentation accessible at http://localhost:5822/docs
SNARL & HTTP servers listening on all interfaces

STARDOG_HOME=/home/patrick/working/stardog-1.0.2 

Fourth, create a database with an input file; use the –server parameter to specify which server:

$ ./stardog-admin create -n myDB -t D -u admin -p admin –server snarl://localhost:5820/ examples/data/University0_0.owl

Gotcha! Would you believe that UniversityO_O.owl has two 0 digits in the name?

Violent disagreement notwithstanding, it is always bad practice to use easily confused letter and digits in files names. Always.

If you are successful you will see:

Bulk loading data to new database.
Data load complete. Loaded 8,521 triples in 00:00:01 @ 8.1K triples/sec.
Successfully created database 'myDB'.

Fifth, optionally, admire the pure RDF bulk loading power…woof!

OK. 😉

Sixth, query the database:

$ ./stardog query -c http://localhost:5822/myDB -q “SELECT DISTINCT ?s WHERE { ?s ?p ?o } LIMIT 10”

If successful, you will see:

Executing Query:

SELECT DISTINCT ?s WHERE { ?s ?p ?o } LIMIT 10

+--------------------------------------------------------+
|                           s                            |
+--------------------------------------------------------+
| http://api.stardog.com                                 |
| http://www.University0.edu                             |
| http://www.Department0.University0.edu                 |
| http://www.Department0.University0.edu/FullProfessor0  |
| http://www.Department0.University0.edu/Course0         |
| http://www.Department0.University0.edu/GraduateCourse0 |
| http://www.Department0.University0.edu/GraduateCourse1 |
| http://www.University84.edu                            |
| http://www.University875.edu                           |
| http://www.University241.edu                           |
+--------------------------------------------------------+

Query returned 10 results in 00:00:00.093

If you happen to make any mistakes, you may want to be aware of:

./stardog-admin drop -n myDB

😉

Your performance will vary but these notes may save you a few minutes and some annoyance in getting Stardog up and running.

July 12, 2012

Semantator: annotating clinical narratives with semantic web ontologies

Filed under: Annotation,Ontology,Protégé,RDF,Semantator,Semantic Web — Patrick Durusau @ 2:40 pm

Semantator: annotating clinical narratives with semantic web ontologies by Dezhao Song, Christopher G. Chute, and Cui Tao. (AMIA Summits Transl Sci Proc. 2012;2012:20-9. Epub 2012 Mar 19.)

Abstract:

To facilitate clinical research, clinical data needs to be stored in a machine processable and understandable way. Manual annotating clinical data is time consuming. Automatic approaches (e.g., Natural Language Processing systems) have been adopted to convert such data into structured formats; however, the quality of such automatically extracted data may not always be satisfying. In this paper, we propose Semantator, a semi-automatic tool for document annotation with Semantic Web ontologies. With a loaded free text document and an ontology, Semantator supports the creation/deletion of ontology instances for any document fragment, linking/disconnecting instances with the properties in the ontology, and also enables automatic annotation by connecting to the NCBO annotator and cTAKES. By representing annotations in Semantic Web standards, Semantator supports reasoning based upon the underlying semantics of the owl:disjointWith and owl:equivalentClass predicates. We present discussions based on user experiences of using Semantator.

If you are an AMIA member, see above for the paper. If not, see: Semantator: annotating clinical narratives with semantic web ontologies (PDF file). And the software/webpage: Semantator.

Software is a plugin for Protege 4.1 or higher.

Looking at the extensive screen shots at the website, which has good documentation, the first question I would ask a potential user is: “Are you comfortable with Protege?” If they aren’t I suspect you are going to invest a lot of time in teaching them ontologies and Protege. Just an FYI.

Complex authoring tools, particularly for newbies, seem like a non-starter to me. For example, why not have a standalone entity extractor (but don’t call it that, call it “I See You (ISY)) that uses a preloaded entity file to recognize entities in a text. Where there is uncertainty, those are displayed in a different color, with drop down options on possible other entities. User get to pick one from the list (no write in ballots). Performs a step towards getting clean data for a second round with another one-trick-pony tool. User contributes, we all benefit.

Which brings me to the common shortfall of annotation solutions: the requirement that the text to be annotated be in plain text.

There are lot of “text” documents but what of those in Word, PDF, Postscript, PPT, Excel, to say nothing of other formats?

The past will not disappear for want of a robust annotation solution.

Nor should it.

July 11, 2012

July 10, 2012

Linked Media Framework [Semantic Web vs. ROI]

Filed under: Linked Data,RDF,Semantic Web,SKOS,SPARQL — Patrick Durusau @ 11:08 am

Linked Media Framework

From the webpage:

The Linked Media Framework is an easy-to-setup server application that bundles central Semantic Web technologies to offer advanced services. The Linked Media Framework consists of LMF Core and LMF Modules.

LMF Usage Scenarios

The LMF has been designed with a number of typical use cases in mind. We currently support the following tasks out of the box:

Target groups are a in particular casual users who are not experts in Semantic Web technologies but still want to publish or work with Linked Data, e.g. in the Open Government Data and Linked Enterprise Data area.

It is a bad assumption that workers in business or government have free time to add semantics to their data sets.

If adding semantics to your data, by linked data or other means is a core value, resource the task just like any other with your internal staff or hire outside help.

A Semantic Web short coming is the attitude that users are interested in or have the time to build it. Assuming the project to be worthwhile and/or doable.

Users are fully occupied with tasks of their own and don’t need a technical elite tossing more tasks onto them. You want the Semantic Web? Suggest you get on that right away.

Integrated data that meets a business need and has proven ROI isn’t the same thing as the Semantic Web. Give me a call if you are interested in the former, not the latter. (I would do the latter as well, but only on your dime.)

I first saw this at semanticweb.com, announcing version 2.2.0 of lmf – Linked Media Framework.

July 6, 2012

SparQLed…Writing SPARQL Queries [Less ZERO-result queries]

Filed under: RDF,Semantic Web,SPARQL — Patrick Durusau @ 4:36 pm

SindiceTech Releases SparQLed As Open Source Project To Simplify Writing SPARQL Queries by Jennifer Zaino.

From the post:

SindiceTech today released SparQLed, the SindiceTech Assisted SPARQL Editor, as an open source project. SindiceTech, a spinoff company from the DERI Institute, commercializes large-scale, Big Data infrastructures for enterprises dealing with semantic data. It has roots in the semantic web index Sindice, which lets users collect, search, and query semantically marked-up web data (see our story here).

SparQLed also is one of the components of the commercial Sindice Suite for helping large enterprises build private linked data clouds. It is designed to give users all the help they need to write SPARQL queries to extract information from interconnected datasets.

“SPARQL is exciting but it’s difficult to develop and work with,” says Giovanni Tummarello, who led the efforts around the Sindice search and analysis engine and is founder and CEO of SindiceTech.

SparQLed Project page.

Maybe we have become spoiled by search engines that always return results, even bad ones:

With SQL, the advantage lies in having a schema which users can look at and understand how to write a query. RDF, on the other hand, has the advantage of providing great power and freedom, because information in RDF can be interconnected freely. But, Tummarello says, “with RDF there is no schema because there is all sorts of information from everywhere.” Without knowing which properties are available specifically for a certain URI and in what context, users can wind up writing queries that return no results and get frustrated by the constant iterating needed to achieve their ends.

I am not encouraged by a features list that promises:

Less ZERO-result queries

July 1, 2012

Cascading map-side joins over HBase for scalable join processing

Filed under: HBase,Joins,Linked Data,LOD,MapReduce,RDF,SPARQL — Patrick Durusau @ 4:45 pm

Cascading map-side joins over HBase for scalable join processing by Martin Przyjaciel-Zablocki, Alexander Schätzle, Thomas Hornung, Christopher Dorner, and Georg Lausen.

Abstract:

One of the major challenges in large-scale data processing with MapReduce is the smart computation of joins. Since Semantic Web datasets published in RDF have increased rapidly over the last few years, scalable join techniques become an important issue for SPARQL query processing as well. In this paper, we introduce the Map-Side Index Nested Loop Join (MAPSIN join) which combines scalable indexing capabilities of NoSQL storage systems like HBase, that suffer from an insufficient distributed processing layer, with MapReduce, which in turn does not provide appropriate storage structures for efficient large-scale join processing. While retaining the flexibility of commonly used reduce-side joins, we leverage the effectiveness of map-side joins without any changes to the underlying framework. We demonstrate the significant benefits of MAPSIN joins for the processing of SPARQL basic graph patterns on large RDF datasets by an evaluation with the LUBM and SP2Bench benchmarks. For most queries, MAPSIN join based query execution outperforms reduce-side join based execution by an order of magnitude.

Some topic map applications include Linked Data/RDF processing capabilities.

The salient comment here being: “For most queries, MAPSIN join based query execution outperforms reduce-side join based execution by an order of magnitude.

June 24, 2012

Stardog 1.0

Filed under: OWL,RDF,Semantic Web — Patrick Durusau @ 8:19 pm

Stardog 1.0 by Kendall Clark.

From the post:

Today I’m happy to announce the release of Stardog 1.0, the fastest, smartest, and easiest to use RDF database on the planet. Stardog fills a hole in the Semantic Technology (and NoSQL database) market for an RDF database that is fast, zero config, lightweight, and feature-rich.

Speed Kills

RDF and OWL are excellent technologies for building data integration and analysis apps. Those apps invariably require complex query processing, i.e., queries where there are lots of joins, complex logical conditions to evaluate, etc. Stardog is targeted at query performance for complex SPARQL queries. We publish performance data so you can see how we’re doing.

Braindead Simple Deployment

Winners ship. Period.

We care very much about simple deployments. Stardog works out-of-the-box with minimal (none, typically) configuration. You shouldn’t have to fight an RDF database for days to install or tune it for great performance. Because Stardog is pure Java, it will run anywhere. It just works and it’s damn fast. You shouldn’t need to buy and configure a cluster of machines to get blazing fast performance from an RDF database. And now you don’t have to.

One More Thing…OWL Reasoning

Finally, Stardog has the deepest, most comprehensive, and best OWL reasoning support of any commerical RDF database available.

Stardog 1.0 supports RDFS, OWL 2 QL, EL, and RL, as well as OWL 2 DL schema-reasoning. It’s also the only RDF database to support closed-world integrity constraint validation and automatic explanations of integrity constraint violations.

If you care about data quality, Stardog 1.0 is worth a hard look.

OK, so I have signed up for an evaluation version, key, etc. Email just arrived.

Downloaded software and license key.

With all the open data laying around, should not be hard to find test data.

More to follow. Comments welcome.

June 6, 2012

Recycling RDF and SPARQL

Filed under: Graphs,RDF,SPARQL — Patrick Durusau @ 7:48 pm

I was surprised to learn the W3C is recycling RDF and SPARQL for graph analytics:

RDF and SPARQL (both standards developed by the World Wide Web Consortium) [were developed] as the industry standard[s] for graph analytics.

It doesn’t hurt to repurpose those standards, assuming they are appropriate for graph analytics.

Or rather, assuming they are appropriate for your graph analytic needs.

BTW, there is a contest to promote recycling of RDF and SPARQL with a $70,000 first prize:

YarcData Announces $100,000 Big Data Graph Analytics Challenge

From the post:

At the 2012 Semantic Technology & Business Conference in San Francisco, YarcData, a Cray company, has announced the planned launch of a “Big Data” contest featuring $100,000 in prizes. The YarcData Graph Analytics Challenge will recognize the best submissions for solutions of un-partitionable, Big Data graph problems.

YarcData is holding the contest to showcase the increasing applicability and adoption of graph analytics in solving Big Data problems. The contest also is intended to promote the use and development of RDF and SPARQL (both standards developed by the World Wide Web Consortium) as the industry standard for graph analytics.

“Graph databases have a significant role to play in analytic environments, and they can solve problems like relationship discovery that other traditional technologies do not handle easily,” said Philip Howard, Research Director, Bloor Research. “YarcData driving thought leadership in this area will be positive for the overall graph database market, and this contest could help expand the use of RDF and SPARQL as valuable tools for solving Big Data problems.”

The grand prize for the first place winner is $70,000. The second place winner will receive $10,000, and the third place winner will receive $5,000. There also will be additional prizes for the other finalists. Contest judges, which will include a combination of Big Data industry analysts, experts from academia and semantic web, and YarcData customers, will review the submissions and select the 10 best contestants.

The YarcData Graph Analytics Challenge will officially begin on Tuesday, June 26, 2012, and winners will be announced during a live Web event on December 4, 2012. Full contest details, including specific criteria and the contest judges, will be announced on June 26. To pre-register for a contest information packet, please visit the YarcData website at www.yarcdata.com. Information packets will be sent out June 26. The contest will be open only to those individuals who are eligible to participate under U.S. and other applicable laws and regulations.

Full details to follow on June 26, 2012.

June 2, 2012

dipLODocus[RDF]

Filed under: RDF,Semantic Web — Patrick Durusau @ 6:17 pm

dipLODocus[RDF]

From the webpage:

dipLODocus[RDF] is a new system for RDF data processing supporting both simple transactional queries and complex analytics efficiently. dipLODocus[RDF] is based on a novel hybrid storage model considering RDF data both from a graph perspective (by storing RDF subgraphs or RDF molecules) and from a “vertical” analytics perspective (by storing compact lists of literal values for a given attribute).

Overview

Our system is built on three main structures: RDF molecule clusters (which can be seen as hybrid structures borrowing both from property tables and RDF subgraphs), template lists (storing literals in compact lists as in a column-oriented database system) and an efficient hash-table indexing URIs and literals based on the clusters they belong to.

Figure below gives a simple example of a few molecule clusters—storing information about students—and of a template list—compactly storing lists of student IDs. Molecules can be seen as horizontal structures storing information about a given object instance in the database (like rows in relational systems). Template lists, on the other hand, store vertical lists of values corresponding to one type of object (like columns in a relational system).

Interesting performance numbers:

  • 30x RDF-3X on LUBM queries
  • 350x Virtuoso on analytic queries

Combines data structures as opposed to adopting one single approach.

Perhaps data structures will be explored and optimized for data, rather than the other way around?

dipLODocus[RDF] | Short and Long-Tail RDF Analytics for Massive Webs of Data by Marcin Wylot, Jigé Pont, Mariusz Wisniewski, and Philippe Cudré-Mauroux (paper – PDF).

I first saw this at the SemanticWeb.com.

June 1, 2012

Are You Going to Balisage?

Filed under: Conferences,RDF,RDFa,Semantic Web,XML,XML Database,XML Schema,XPath,XQuery,XSLT — Patrick Durusau @ 2:48 pm

To the tune of “Are You Going to Scarborough Fair:”

Are you going to Balisage?
Parsley, sage, rosemary and thyme.
Remember me to one who is there,
she once was a true love of mine.

Tell her to make me an XML shirt,
Parsley, sage, rosemary, and thyme;
Without any seam or binary code,
Then she shall be a true lover of mine.

….

Oh, sorry! There you will see:

  • higher-order functions in XSLT
  • Schematron to enforce consistency constraints
  • relation of the XML stack (the XDM data model) to JSON
  • integrating JSON support into XDM-based technologies like XPath, XQuery, and XSLT
  • XML and non-XML syntaxes for programming languages and documents
  • type introspection in XQuery
  • using XML to control processing in a document management system
  • standardizing use of XQuery to support RESTful web interfaces
  • RDF to record relations among TEI documents
  • high-performance knowledge management system using an XML database
  • a corpus of overlap samples
  • an XSLT pipeline to translate non-XML markup for overlap into XML
  • comparative entropy of various representations of XML
  • interoperability of XML in web browsers
  • XSLT extension functions to validate OCL constraints in UML models
  • ontological analysis of documents
  • statistical methods for exploring large collections of XML data

Balisage is an annual conference devoted to the theory and practice of descriptive markup and related technologies for structuring and managing information. Participants typically include XML users, librarians, archivists, computer scientists, XSLT and XQuery programmers, implementers of XSLT and XQuery engines and other markup-related software, Topic-Map enthusiasts, semantic-Web evangelists, members of the working groups which define the specifications, academics, industrial researchers, representatives of governmental bodies and NGOs, industrial developers, practitioners, consultants, and the world’s greatest concentration of markup theorists. Discussion is open, candid, and unashamedly technical.

The Balisage 2012 Program is now available at: http://www.balisage.net/2012/Program.html

May 19, 2012

Searching For An Honest Engineer

Filed under: Google Knowledge Graph,RDF,Semantic Web — Patrick Durusau @ 7:28 pm

Sean Golliher needs to take his lantern, to search for an honest engineer at the W3C.

Sean writes in Google Just Hi-jacked the Semantic Web Vocabulary:

Google announced they’re rolling out new enhancements to their search technology and they’re calling it the “Knowledge Graph.” For those involved in the Semantic Web Google’s “Knowledge Graph” is nothing new. After watching the video, and reading through the announcements, the Google engineers are giving the impression, to those familiar with this field, that they have created something new and innovative.

While it ‘s commendable that Google is improving search it’s interesting to note the direct translations of Google’s “new language” to the existing semantic web vocabulary. Normally engineers and researchers quote, or at least reference, the original sources of their ideas. One can’t help but notice that the semantic web isn’t mentioned in any of Google’s announcements. After watching the different reactions from the semantic web community I found that many took notice of the language Google used and how the ideas from the semantic web were repackaged as “new” and discovered by Google.

Did you know that the W3C invented the ideas for:

  • Knowledge Graph
  • Relationships Between things
  • Naming things Better (Taxonomy?)
  • Objects/Entities
  • Ambiguous Language (Semantics?)
  • Connecting Things
  • discover new, and relevant, things you like (Serendipity?)
  • meaning (Semantic?)
  • graph (RDF?)
  • things (URIs (Linked Data)?)
  • real-world entities and their relationships to one another: things (Linked Data?)

?

Really? Semantic, serendipity, graph, relationships between real-world entities?

All invented by the W3C and/or carefully crediting prior work.

Right.

Good luck with your search Sean.

May 17, 2012

“…Things, Not Strings”

Filed under: Google Knowledge Graph,Marketing,RDF,RDFa,Semantic Web,Topic Maps — Patrick Durusau @ 6:30 pm

The brilliance at Google spreads beyond technical chops and into their marketing department.

Effective marketing can be what you do but what you don’t do as well.

What did Google not do with the Google Knowledge Graph?

Google Knowledge Graph does not require users to:

  • learn RDF/RDFa
  • learn OWL
  • learn various syntaxes
  • build/choose ontologies
  • use SW software
  • wait for authoritative instructions from Mount W3C

What does Google Knowledge Graph do?

It gives users information about things, things that are of interest to users. Using their web browsers.

Let’s see, we can require users to do what we want, or, we can give users what they want.

Which one do you think is the most likely to succeed? (No peeking!)

May 15, 2012

Using “Punning” to Answer httpRange-14

Filed under: Linked Data,RDF,Semantic Web — Patrick Durusau @ 6:50 pm

Using “Punning” to Answer httpRange-14

Jeni Tennison writes in her introduction:

As part of the TAG’s work on httpRange-14, Jonathan Rees has assessed how a variety of use cases could be met by various proposals put before the TAG. The results of the assessment are a matrix which shows that “punning” is the most promising method, unique in not failing on either ease of use (use case J) or HTTP consistency (use case M).

In normal use, “punning” is about making jokes based around a word that has two meanings. In this context, “punning” is about using the same URI to mean two (or more) different things. It’s most commonly used as a term of art in OWL but normal people don’t need to worry particularly about that use. Here I’ll explore what that might actually mean as an approach to the httpRange-14 issue.

Jeni writes quite well and if you are really interested in the details of this self-inflicted wound, read her post in its entirety.

The post is summarized when she says:

Thus an implication of this approach is that the people who define languages and vocabularies must specify what aspect of a resource a URI used in a particular way identifies.

Her proposal makes disambiguation explicit. A strategy that is more likely to be successful than others.

Following that statement she treats how to usefully proceed from that position. (No guarantee her position will carry the day but it would be a good thing if it does.)

May 14, 2012

Web Developers Can Now Easily “Play” with RDFa

Filed under: RDF,RDFa,Semantic Web — Patrick Durusau @ 9:16 am

Web Developers Can Now Easily “Play” with RDFa by Eric Franzon.

From the post:

Yesterday, we announced RDFa.info, a new site devoted to helping developers add RDFa (Resource Description Framework-in-attributes) to HTML.

Building on that work, the team behind RDFa.info is announcing today the release of “PLAY,” a live RDFa editor and visualization tool. This release marks a significant step in providing tools for web developers that are easy to use, even for those unaccustomed to working with RDFa.

“Play” is an effort that serves several purposes. It is an authoring environment and markup debugger for RDFa that also serves as a teaching and education tool for Web Developers. As Alex Milowski, one of the core RDFa.info team, said, “It can be used for purposes of experimentation, documentation (e.g. crafting an example that produces certain triples), and testing. If you want to know what markup will produce what kind of properties (triples), this tool is going to be great for understanding how you should be structuring your own data.”

A useful site for learning RDFa that is open for contributions, such as examples and documentation.

May 10, 2012

Simple federated queries with RDF [Part 1]

Filed under: Federation,RDF,SPARQL — Patrick Durusau @ 4:12 pm

Simple federated queries with RDF [Part 1]

Bob DuCharme writes:

A few more triples to identify some relationships, and you’re all set.

[side note] Easy aggregation without conversion is where semantic web technology shines the brightest.

Once, at an XML Summer School session, I was giving a talk about semantic web technology to a group that included several presenters from other sessions. This included Henry Thompson, who I’ve known since the SGML days. He was still a bit skeptical about RDF, and said that RDF was in the same situation as XML—that if he and I stored similar information using different vocabularies, we’d still have to convert his to use the same vocabulary as mine or vice versa before we could use our data together. I told him he was wrong—that easy aggregation without conversion is where semantic web technology shines the brightest.

I’ve finally put together an example. Let’s say that I want to query across his address book and my address book together for the first name, last name, and email address of anyone whose email address ends with “.org”. Imagine that his address book uses the vCard vocabulary and the Turtle syntax and looks like this,

Bob is an expert in more areas of markup, SGML/XML, SPARQL and other areas than I can easily count. Not to mention being a good friend.

Take a look at Bob’s post and decide for yourself how “simple” the federation is following Bob’s technique.

I am just going to let it speak for itself today.

I will outline obvious and some not so obvious steps in Bob’s “simple” federated queries in Part II.

April 6, 2012

“Give me your tired, your poor, your huddled identifiers yearning to be used.”

Filed under: Identifiers,RDF,Semantic Web — Patrick Durusau @ 6:52 pm

I was reminded of the title quote when I read Richard Wallis’s: A Fundamental Linked Data Debate.

Contrary to Richard’s imaginings, the vast majority of people on and off the Web are not waiting for the debates on the W3C’s Technical Architecture (TAG) or Linked Open Data (public-lod) mailing lists to be resolved.

Why?

They had identifiers for subjects long before the WWW, Semantic Web, Linked Data or whatever and will have identifiers for subjects long after those efforts and their successors are long forgotten.

Some of those identifiers are still in use today and will survive well into the future. Others are historical curiosities.

Moreover, when it was necessary to distinguish between identifiers and the things identified, that need was met.

Entire the WWW and its poster child, Tim Berners-Lee.

It was Tim Berners-Lee who created the problem Richard frames as: “the difference between a thing and a description of that thing.”

Amazing how much fog of discussion there has been to cover up that amateurish mistake.

The problem isn’t one of conflicting world views (a la Jeni Tennison) but rather how given a bare URI, how to interpret it? Given the bad choices made in the Garden of the Web as it were.

That we simply abandon bare URIs as a solution has never darkened their counsel. They would rather impose the 303/TBL burden on everyone rather than admit to fundamental error.

I have a better solution.

The rest of us should carry on with the identifiers that we want to use, whether they be URIs or not. Whether they are prior identifiers or new ones. And we should put forth statements/standards/documents to establish how in our contexts, those identifiers should be used.

If IBM, Oracle, Microsoft and a few other adventurers decide that IT can benefit from some standard terminology, I am sure they can influence others to use it. Whether composed of URIs or not. And the same can be said for many other domains, most of who will do far better than the W3C at fashioning identifiers for themselves.

Take heart TAG and LOD advocates.

As the poem says: “Give me your tired, your poor, your huddled identifiers yearning to be used.”

Someday your identifiers will be preserved as well.

April 5, 2012

SpiderStore: A Native Main Memory Approach for Graph Storage

Filed under: Graphs,RDF,SpiderStore — Patrick Durusau @ 3:37 pm

SpiderStore: A Native Main Memory Approach for Graph Storage by Robert Binna, Wolfgang Gassler, Eva Zangerle, Dominic Pacher, and Günther Specht.

Abstract:

The ever increasing amount of linked open data results in a demand for high performance graph databases. In this paper we therefore introduce a memory layout which is tailored to the storage of large RDF data sets in main memory. We present the memory layout SpiderStore. This layout features a node centric design which is in contrast to the prevailing systems using triple focused approaches. The benefi t of this design is a native mapping between the nodes of a graph onto memory locations connected to each other. Based on this native mapping an addressing schema which facilitates relative addressing together with a snapshot mechanism is presented. Finally a performance evaluation, which demonstrates the capabilities, of the SpiderStore memory layout is performed using an RDF-data set consisting of about 190 mio triples.

I saw this in a tweet by Marko A. Rodriguez.

I am sure René Pickhardt will be glad to see the focus on edges in this paper. 😉

It is hard to say which experiments or lines of inquiry will lead to substantial breakthroughs, but focusing on smallish data sets is unlikey to push the envelope very hard. Even if smallish experiments are sufficient for Linked Data scenarios.

The authors project that their technique might work up for up to a billion triples. Yes, well, but by 2024, one science installation will be producing one exabyte of data per day. And that is just one source of data.

The science community isn’t going to wait for the W3C to catch up, nor should they.

All Aboard for Quasi-Productive Stemming

Filed under: RDF,Semantic Web — Patrick Durusau @ 3:35 pm

All Aboard for Quasi-Productive Stemming by Bob Carpenter.

From the post:

One of the words Becky and I are having annotated for word sense (collecting 25 non-spam Mechanical Turk responses per word) is the nominal (noun) use of “board”.

One of the examples was drawn from a text with a typo where “aboard” was broken into two words, “a board”. I looked at the example, and being a huge fan of nautical fiction, said “board is very productive — we should have the nautical sense”. Then I thought a bit longer and had to admit I didn’t know what “board” meant all by itself. I did know a whole bunch of terms that involved “board” as a stem:

Highly entertaining post by Bob on the meanings of “board.”

I have a question: Which sense of board gets the URL: http://w3.org/people/TBL/OneWorldMeaning/board?

Just curious.

April 4, 2012

Linked Data Basic Profile 1.0

Filed under: Linked Data,LOD,RDF,Semantic Web — Patrick Durusau @ 3:33 pm

Linked Data Basic Profile 1.0

A group of W3C members, IBM, DERI, EMC, Oracle, Red Hat, Tasktop and SemanticWeb.com have made a submission to the W3C with the title: Linked Data Basic Profile 1.0.

The submission consists of:

Linked Data Basic Profile 1.0

Linked Data Basic Profile Use Cases and Requirements

Linked Data Basic Profile RDF Schema

Interesting proposal.

Doesn’t try to do everything. The old 303/TBL is relegated to pagination. Probably a good use for it.

Comments?

New Paper: Linked Data Strategy for Global Identity

Filed under: Identity,RDF,Semantic Web — Patrick Durusau @ 3:32 pm

New Paper: Linked Data Strategy for Global Identity

Angela Guess writes:

Hugh Glaser and Harry Halpin have published a new PhD thesis for the University of Southampton Research Repository entitled “The Linked Data Strategy for Global Identity” (2012). The paper was published by the IEEE Computer Society. It is available for download here for non-commercial research purposes only. The abstract states, “The Web’s promise for planet-scale data integration depends on solving the thorny problem of identity: given one or more possible identifiers, how can we determine whether they refer to the same or different things? Here, the authors discuss various ways to deal with the identity problem in the context of linked data.”

At first I was hurt that I didn’t see a copy of Harry’s dissertation before it was published. I don’t always agree with him (see below) but I do like keeping up with his writing.

Then I discovered this is a four page dissertation. I guess Angela never got past the cover page. It is an article in the IEEE zine, IEEE Internet Computing.

Harry fails to mention that the HTTP 303 “trick,” was made necessary by Tim Berners-Lee’s failure to understand the necessity to distinguish identifiers from addresses. Rather that admit to or correct that failure, the solution being pushed is to create web traffic overhead in the form of 303 “tricks.” “303” should be re-named, “TBL”, so we are reminded with each invocation who made it necessary. (lower middle column, page 3)

I partially agree with:

We’re only just beginning to explore the vast field of identity, and more work is needed before linked data can fulfill its full potential.(on page 5)

The “just beginning” part is true enough. But therein lies the rub. Rather than explore the “…vast field of identity…” which changes from domain to domain, first and then propose a solution, the Linked Data proponents took the other path.

They proposed a solution and in the face of its failure to work, now are inching towards the “…vast field of identity….” Seems a might late for that.

Harry concludes:

The entire bet of the linked data enterprise critically rests on using URIs to create identities for everything. Whether this succeeds might very well determine whether information integration will be trapped in centralized proprietary databases or integrated globally in a decentralized manner with open standards. Given the tremendous amount of data being created and the Web’s ubiquitous nature, URIs and equivalence links might be the best chance we have of solving the identity problem, transforming a profoundly difficult philosophical issue into a concrete engineering project.

The first line, “The entire bet….” omits to say that we need the same URIs for everything. That is called the perfect language project, which has a very long history of consistent failure. Recent attempts include Esperanto and LogLang.

The second line, “Whether this succeeds…trapped in centralized proprietary databases…” is fear mongering. “If you don’t support linked data, (insert your nightmare scenario).”

The final line, “…transforming a profoundly difficult philosophical issue into a concrete engineering project” is magical thinking.

Identity is a very troubled philosophical issue but proposing a solution without understanding the problem doesn’t sound like a high percentage shot to me. You?

The Problem With Names (and the W3C)

Filed under: RDF,Semantic Web — Patrick Durusau @ 3:30 pm

The Problem With Names by Paul Miller.

Paul details the struggle of museums to make their holdings web accessible.

The problem isn’t reluctance or a host of other issues that Paul points out.

The problem is one of identifiers, that is, names.

Museums have crafted complex identifiers for their holdings and not unreasonably expect to continue to use them.

But all they are being offered are links.

The Rijksmuseum is one of several museums around the world that is actively and enthusiastically working to open up its data, so that it may be used, enjoyed, and enriched by a whole new audience. But until some of the core infrastructure — the names, the identifiers, the terminologies, and the concepts — upon which this and other museums depend becomes truly part of the web, far too much of the opportunity created by big data releases such as the Rijksmuseum’s will be wasted.

When is the W3C going to admit that subjects can have complex names/identifiers? Not just simple links?

That would be a game changer. For everyone.

March 13, 2012

W3C HTML Data Task Force Publishes 2 Notes

Filed under: HTML Data,Microdata,RDF,Semantic Web — Patrick Durusau @ 8:16 pm

W3C HTML Data Task Force Publishes 2 Notes

From the post:

The W3C HTML Data Task Force has published two notes, the HTML Data Guide and Microdata to RDF. According to the abstract of the former, ” This guide aims to help publishers and consumers of HTML data use it well. With several syntaxes and vocabularies to choose from, it provides guidance about how to decide which meets the publisher’s or consumer’s needs. It discusses when it is necessary to mix syntaxes and vocabularies and how to publish and consume data that uses multiple formats. It describes how to create vocabularies that can be used in multiple syntaxes and general best practices about the publication and consumption of HTML data.”

One can only hope that the W3C will eventually sanctify industry standard practices for metadata. Perhaps they will call it RDF-NG. Whatever.

« Newer PostsOlder Posts »

Powered by WordPress