Archive for the ‘RDFa’ Category

Workload Matters: Why RDF Databases Need a New Design

Saturday, May 17th, 2014

Workload Matters: Why RDF Databases Need a New Design by Gunes¸ Aluc¸, M. Tamer ¨ Ozsu, and, Khuzaima Daudjee.


The Resource Description Framework (RDF) is a standard for conceptually describing data on the Web, and SPARQL is the query language for RDF. As RDF is becoming widely utilized, RDF data management systems are being exposed to more diverse and dynamic workloads. Existing systems are workload-oblivious, and are therefore unable to provide consistently good performance. We propose a vision for a workload-aware and adaptive system. To realize this vision, we re-evaluate relevant existing physical design criteria for RDF and address the resulting set of new challenges.

The authors establish RDF data management systems are in need of better processing models. However, they mention a “prototype” only in their conclusion and offer no evidence concerning their possible alternatives for RDF processing.

I don’t doubt the need for better RDF processing but I would think the first step would be to determine the goals of RDF processing, separate and apart from the RDF model.

Simply because we conceptualize data as being encoded in “triples,” does not mean that computers must process them as “triples.” They can if it is advantageous but not if there are better processing models.

I first saw this in a tweet by Olaf Hartig.

Three RDFa Recommendations Published

Thursday, August 22nd, 2013

Three RDFa Recommendations Published

From the announcement:

  • HTML+RDFa 1.1, which defines rules and guidelines for adapting the RDFa Core 1.1 and RDFa Lite 1.1 specifications for use in HTML5 and XHTML5. The rules defined in this specification not only apply to HTML5 documents in non-XML and XML mode, but also to HTML4 and XHTML documents interpreted through the HTML5 parsing rules.
  • The group also published two Second Editions for RDFa Core 1.1 and XHTML+RDFa 1.1, folding in the errata reported by the community since their publication as Recommendations in June 2012; all changes were editorial.
  • The group also updated the a RDFa 1.1 Primer.

The deeper I get into HTML+RDFa 1.1, the more I think a random RDFa generator would be an effective weapon against government snooping.

Something copies some percentage of your text and places it in a comment and generates random RDFa 1.1 markup for it, thus: <!– – your content + RDFa – –>.

Improves the stats for the usage of RDFa 1.1 and if the government tries to follow all the RDFa 1.1 rules, well, let’s just say they will have less time for other mischief. 😉

Agile Knowledge Engineering and Semantic Web (AKSW)

Sunday, April 28th, 2013

Agile Knowledge Engineering and Semantic Web (AKSW)

From the webpage:

The Research Group Agile Knowledge Engineering and Semantic Web (AKSW) is hosted by the Chair of
Business Information Systems (BIS) of the Institute of Computer Science (IfI) / University of Leipzig as well as the Institute for Applied Informatics (InfAI).


  • Development of methods, tools and applications for adaptive Knowledge Engineering in the context of the Semantic Web
  • Research of underlying Semantic Web technologies and development of fundamental Semantic Web tools and applications
  • Maturation of strategies for fruitfully combining the Social Web paradigms with semantic knowledge representation techniques

AKSW is committed to the free software, open source, open access and open knowledge movements.

Complete listing of projects.

I have mentioned several of these projects before. On seeing a reminder of the latest release of RDFaCE (RDFa Content Editor), I thought I should post on the common source of those projects.

Tuesday, March 12th, 2013 namespace lookup for RDF developers

From the about page:

The intention of this service is to simplify a common task in the work of RDF developers: remembering and looking up URI prefixes.

You can look up prefixes from the search box on the homepage, or directly by typing URLs into your browser bar, such as or,dc,owl.ttl.

New prefix mappings can be added by anyone. If multiple conflicting URIs are submitted for the same namespace, visitors can vote for the one they consider the best. You are only allowed one vote or namespace submission per day.

For n3, xml, rdfa, sparql, the result interface shows the URI prefixes in use.

But if there is more than one URI prefix, difference URI prefixes appear with each example.

Don’t multiple, conflicting URI prefixes seem problematic to you?

Semantic Technology ROI: Article of Faith? or Benchmarks for 1.28% of the web?

Friday, December 14th, 2012

Orri Erling, in LDBC: A Socio-technical Perspective, writes in part:

I had a conversation with Michael at a DERI meeting a couple of years ago about measuring the total cost of technology adoption, thus including socio-technical aspects such as acceptance by users, learning curves of various stakeholders, whether in fact one could demonstrate an overall gain in productivity arising from semantic technologies. [in my words, paraphrased]

“Can one measure the effectiveness of different approaches to data integration?” asked I.

“Of course one can,” answered Michael, “this only involves carrying out the same task with two different technologies, two different teams and then doing a double blind test with users. However, this never happens. Nobody does this because doing the task even once in a large organization is enormously costly and nobody will even seriously consider doubling the expense.”

LDBC does in fact intend to address technical aspects of data integration, i.e., schema conversion, entity resolution, and the like. Addressing the sociotechnical aspects of this (whether one should integrate in the first place, whether the integration result adds value, whether it violates privacy or security concerns, whether users will understand the result, what the learning curves are, etc.) is simply too diverse and so totally domain dependent that a general purpose metric cannot be developed, at least not in the time and budget constraints of the project. Further, adding a large human element in the experimental setting (e.g., how skilled the developers are, how well the stakeholders can explain their needs, how often these needs change, etc.) will lead to experiments that are so expensive to carry out and whose results will have so many unquantifiable factors that these will constitute an insuperable barrier to adoption.

The need for parallel systems to judge the benefits of a new technology is a straw man. And one that is easy to dispel.

For example, if your company provides technical support, you are tracking metrics on how quickly your staff can answer questions. And probably customer satisfaction with your technical support.

Both are common metrics in use today.

Assume the suggestion that linked data to improve technical support for your products. You begin with a pilot project to measure the benefit from the suggested change.

If the length of support calls goes down or customer customer satisfaction goes up, or both, change to linked data. If not, don’t.

Naming a technology as “semantic” doesn’t change how you measure the benefits of a change in process.

LDBC will find purely machine based performance measures easier to produce than answering more difficult socio-technical issues.

But of what value are great benchmarks for a technology that no one wants to use?

See my comments under: Web Data Commons (2012) – [RDFa at 1.28% of 40.5 million websites]. Benchmarks for 1.28% of the web?

Web Data Commons (2012) – [RDFa at 1.28% of 40.5 million websites]

Friday, December 14th, 2012

Web Data Commons announced the extraction results from the August 2012 Common Crawl corpus on 2012-12-10!


The August 2012 Common Crawl Corpus is available on Amazon S3 in the bucket aws-publicdatasets under the key prefix /common-crawl/parse-output/segment/ .

The numbers:

Extraction Statistics

Crawl Date January-June 2012
Total Data 40.1 Terabyte (compressed)
Parsed HTML URLs 3,005,629,093
URLs with Triples 369,254,196
Domains in Crawl 40,600,000
Domains with Triples 2,286,277
Typed Entities 1,811,471,956
Triples 7,350,953,995

See also:

Web Data Commons Extraction Report – August 2012 Corpus


Additional Statistics and Analysis of the Web Data Commons August 2012 Corpus

Where the authors report:

Altogether we discovered structured data within 369 million of the 3 billion pages contained in the Common Crawl corpus (12.3%). The pages containing structured data originate from 2.29 million among the 40.5 million websites (PLDs) contained in the corpus (5.65%). Approximately 519 thousand websites use RDFa, while only 140 thousand websites use Microdata. Microformats are used on 1.7 million websites. It is interesting to see that Microformats are used by approximately 2.5 times as many websites as RDFa and Microdata together.

PLDs = Pay-Level-Domains.

The use of Microformats on “2.5 times as many websites as RDFa and Microdata together” has to make you wonder about the viability of RDFa.

Or to put it differently, if RDFa is 1.28% of the 40.5 million websites, eight (8) years after its introduction (2004) and four (4) years after reaching Recommendation status (2008), is it time to look for an alternative?

I first saw the news about the new Web Data Commons data drop in a tweet by Tobias Trapp.


Thursday, December 13th, 2012

HTML+RDFa 1.1 Support for RDFa in HTML4 and HTML5


This specification defines rules and guidelines for adapting the RDFa Core 1.1 and RDFa Lite 1.1 specifications for use in HTML5 and XHTML5. The rules defined in this specification not only apply to HTML5 documents in non-XML and XML mode, but also to HTML4 and XHTML documents interpreted through the HTML5 parsing rules.

Are You Going to Balisage?

Friday, June 1st, 2012

To the tune of “Are You Going to Scarborough Fair:”

Are you going to Balisage?
Parsley, sage, rosemary and thyme.
Remember me to one who is there,
she once was a true love of mine.

Tell her to make me an XML shirt,
Parsley, sage, rosemary, and thyme;
Without any seam or binary code,
Then she shall be a true lover of mine.


Oh, sorry! There you will see:

  • higher-order functions in XSLT
  • Schematron to enforce consistency constraints
  • relation of the XML stack (the XDM data model) to JSON
  • integrating JSON support into XDM-based technologies like XPath, XQuery, and XSLT
  • XML and non-XML syntaxes for programming languages and documents
  • type introspection in XQuery
  • using XML to control processing in a document management system
  • standardizing use of XQuery to support RESTful web interfaces
  • RDF to record relations among TEI documents
  • high-performance knowledge management system using an XML database
  • a corpus of overlap samples
  • an XSLT pipeline to translate non-XML markup for overlap into XML
  • comparative entropy of various representations of XML
  • interoperability of XML in web browsers
  • XSLT extension functions to validate OCL constraints in UML models
  • ontological analysis of documents
  • statistical methods for exploring large collections of XML data

Balisage is an annual conference devoted to the theory and practice of descriptive markup and related technologies for structuring and managing information. Participants typically include XML users, librarians, archivists, computer scientists, XSLT and XQuery programmers, implementers of XSLT and XQuery engines and other markup-related software, Topic-Map enthusiasts, semantic-Web evangelists, members of the working groups which define the specifications, academics, industrial researchers, representatives of governmental bodies and NGOs, industrial developers, practitioners, consultants, and the world’s greatest concentration of markup theorists. Discussion is open, candid, and unashamedly technical.

The Balisage 2012 Program is now available at:

“…Things, Not Strings”

Thursday, May 17th, 2012

The brilliance at Google spreads beyond technical chops and into their marketing department.

Effective marketing can be what you do but what you don’t do as well.

What did Google not do with the Google Knowledge Graph?

Google Knowledge Graph does not require users to:

  • learn RDF/RDFa
  • learn OWL
  • learn various syntaxes
  • build/choose ontologies
  • use SW software
  • wait for authoritative instructions from Mount W3C

What does Google Knowledge Graph do?

It gives users information about things, things that are of interest to users. Using their web browsers.

Let’s see, we can require users to do what we want, or, we can give users what they want.

Which one do you think is the most likely to succeed? (No peeking!)

Web Developers Can Now Easily “Play” with RDFa

Monday, May 14th, 2012

Web Developers Can Now Easily “Play” with RDFa by Eric Franzon.

From the post:

Yesterday, we announced, a new site devoted to helping developers add RDFa (Resource Description Framework-in-attributes) to HTML.

Building on that work, the team behind is announcing today the release of “PLAY,” a live RDFa editor and visualization tool. This release marks a significant step in providing tools for web developers that are easy to use, even for those unaccustomed to working with RDFa.

“Play” is an effort that serves several purposes. It is an authoring environment and markup debugger for RDFa that also serves as a teaching and education tool for Web Developers. As Alex Milowski, one of the core team, said, “It can be used for purposes of experimentation, documentation (e.g. crafting an example that produces certain triples), and testing. If you want to know what markup will produce what kind of properties (triples), this tool is going to be great for understanding how you should be structuring your own data.”

A useful site for learning RDFa that is open for contributions, such as examples and documentation.

A new RDFa Test Harness

Friday, March 23rd, 2012

A new RDFa Test Harness by Gregg Kellogg.

From the post:

This is an introductory blog post on the creation of a new RDFa Test Suite. Here we discuss the use of Sinatra, Backbone.js and Bootstrap.js to run the test suite. Later will come articles on the usefulness of JSON-LD as a means of driving a test harness, generating test reports, and the use of BrowserID to deal with Distributed Denial of Service attacks that cropped up overnight.

Interesting but strikes me a formal/syntax validation of the RDFa in question. Useful, but only up to a point. Yes?

Can you point me to an RDFa or RDF test harness that tests the semantic “soundness” of the claims made in RDFa or RDF?

Quite easily may exist and I have just not seen it.


Web Data Commons

Thursday, March 22nd, 2012

Web Data Commons

From the webpage:

More and more websites have started to embed structured data describing products, people, organizations, places, events into their HTML pages. The Web Data Commons project extracts this data from several billion web pages and provides the extracted data for download. Web Data Commons thus enables you to use the data without needing to crawl the Web yourself.

More and more websites embed structured data describing for instance products, people, organizations, places, events, resumes, and cooking recipes into their HTML pages using encoding standards such as Microformats, Microdatas and RDFa. The Web Data Commons project extracts all Microformat, Microdata and RDFa data from the Common Crawl web corpus, the largest and most up-to-data web corpus that is currently available to the public, and provide the extracted data for download in the form of RDF-quads and (soon) also in the form of CSV-tables for common entity types (e.g. product, organization, location, …).

Web Data Commons thus enables you to use structured data originating from hundreds of million web pages within your applications without needing to crawl the Web yourself.

Pages in the Common Crawl corpora are included based on their PageRank score, thereby making the crawls snapshots of the current popular part of the Web.

This reminds me of the virtual observatory practice in astronomy. Astronomical data is too large to easily transfer and many who need to use the data lack the software or processing power. The solution? Holders of the data make it available via interfaces that deliver a sub-part of the data, processed according to the requester’s needs.

The Web Data Commons is much the same thing as it frees most of us from both crawling the web and/or extracting structured data from it. Or at least giving us the basis for more pointed crawling of the web.

A very welcome development!

rNews 1.0: Introduction to rNews

Friday, February 17th, 2012

rNews 1.0: Introduction to rNews

The New York Times started using rNews to tag content on the 23rd of January, 2012. To use rNews as fodder for your application (mapping or otherwise), it won’t hurt to look over this introduction to rNews.

From the website:

rNews is a data model for embedding machine-readable publishing metadata in web documents and a set of suggested implementations. In this document, we’ll provide an overview rNews and an implementation guide. We’ll get started by reviewing the class diagram of the rNews data model. Following that we’ll review each individual class. After that we will use rNews to annotate a sample news document. We will conclude with a guide for implementors of rNews.

I would validate the “rNews” periodically from any site just as a sanity check.

Introduction to: RDFa

Monday, February 6th, 2012

Introduction to: RDFa by Juan Sequeda.

From the post:

Simply put, RDFa is another syntax for RDF. The interesting aspect of RDFa is that it is embedded in HTML. This means that you can state what things on your HTML page actually mean. For example, you can specify that a certain text is the title of a blog post or it’s the name of a product or it’s the price for a certain product. This is starting to be commonly known as “adding semantic markup”.

Historically, RDFa was specified only for XHTML. Currently, RDFa 1.1 is specified for XHTML and HTML5. Additionally, RDFa 1.1 works for any XML-based language such as SVG. Recently, RDFa Lite was introduced as “a small subset of RDFa consisting of a few attributes that may be applied to most simple to moderate structured data markup tasks.” It is important to note that RDFa is not the only way to add semantics to your webpages. Microdata and Microformats are other options, and I will discuss this later on. As a reminder, you can publish your data as Linked Data through RDFa. Inside your markup, you can link to other URIs or others can link to your HTML+RDFa webpages.

A bit later in the post the author discusses Jenni Tennison’s comparison of RDFa and microformats.

If you are fond of inline markup, which limits you to creating new documents or editing old ones, RDFa or microformats may be of interest.

On the other hand, if you think about transient nodes such as are described in A transient hypergraph-based model for data access, then you have to wonder why you are being limited to new documents or edited old ones?

One assumes that if your application can read a document, you have access to its contents. If you have access to its contents, then a part of that content, either its underlying representation or the content itself, can trigger the creation of a transient node or edge (or permanent ones).

As I will discuss in a post later today, RDF conflates the tasks of identification, assignment of semantics and reasoning (at least). Which may account for it doing all three poorly. (There are other explanations but I am trying to be generous.)

RDFa 1.1

Thursday, December 8th, 2011

RDFa 1.1

From the draft:

The last couple of years have witnessed a fascinating evolution: while the Web was initially built predominantly for human consumption, web content is increasingly consumed by machines which expect some amount of structured data. Sites have started to identify a page’s title, content type, and preview image to provide appropriate information in a user’s newsfeed when she clicks the “Like” button. Search engines have started to provide richer search results by extracting fine-grained structured details from the Web pages they crawl. In turn, web publishers are producing increasing amounts of structured data within their Web content to improve their standing with search engines.

A key enabling technology behind these developments is the ability to add structured data to HTML pages directly. RDFa (Resource Description Framework in Attributes) is a technique that allows just that: it provides a set of markup attributes to augment the visual information on the Web with machine-readable hints. In this Primer, we show how to express data using RDFa in HTML, and in particular how to mark up existing human-readable Web page content to express machine-readable data.

This document provides only a Primer to RDFa. The complete specification of RDFa, with further examples, can be found in the RDFa 1.1 Core [RDFA-CORE], the XHTML+RDFa 1.1 [XHTML-RDFA], and the HTML5+RDFa 1.1 [HTML-RDFA] specifications.

I am sure this wasn’t an intentional contrast, but compare this release with that of RDFa Lite 1.1.

Which one would you rather teach a room full of newbie (or even experienced) HTML hackers?

Don’t be shy, keep your hands up!

I don’t know that RDFa Lite 1.1 is “lite” enough but I think it is getting closer to a syntax that might actually be used.

RDFa Lite 1.1

Thursday, December 8th, 2011

RDFa Lite 1.1 (new draft)

From the W3C:

One critique of RDFa is that is has too much functionality, leaving first-time authors confused about the more advanced features. RDFa Lite is a minimalist version of RDFa that helps authors easily jump into the structured data world. The goal was to outline a small subset of RDFa that will work for 80% of the Web authors out there doing simple data markup.

Well, it’s short enough.

Comments are being solicited so here’s your chance.

Still using simple identifiers for subjects, which may be sufficient in some cases. Depends. The bad part is that doesn’t improve as you go up the chain to more complex forms of RDFa/RDF.

BTW, does anyone have a good reference for what it means to have a web of things?

Just curious what is going to be left on the cutting room floor from the Semantic Web and its “web of things?”

Will the Semantic Web be the Advertising Web that pushes content at me, whether I am interested or not?


Tuesday, November 8th, 2011


Did you know that Jena is incubating at Apache now?

Welcome to the Apache Jena project! Jena is a Java framework for building Semantic Web applications. Jena provides a collection of tools and Java libraries to help you to develop semantic web and linked-data apps, tools and servers.

The Jena Framework includes:

  • an API for reading, processing and writing RDF data in XML, N-triples and Turtle formats;
  • an ontology API for handling OWL and RDFS ontologies;
  • a rule-based inference engine for reasoning with RDF and OWL data sources;
  • stores to allow large numbers of RDF triples to be efficiently stored on disk;
  • a query engine compliant with the latest SPARQL specification
  • servers to allow RDF data to be published to other applications using a variety of protocols, including SPARQL

Apache Incubator Apache Jena is an effort undergoing incubation at The Apache Software Foundation (ASF), sponsored by the Incubator project. Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects. While incubation status is not necessarily a reflection of the completeness or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF.

Rob Weir has pointed out that since ODF (OpenDocument Format) 1.2 includes support for RDFa and RDF XML that Jena may have a role to play in ODF’s future.

You can learn more about ODF 1.2 at the OpenDocument TC.

Adding support to the ODFToolkit for RDFa/RDF and/or demonstrating the benefits of RDFa/RDF in ODF 1.2 would be most welcome!

RDFa 1.1 Lite

Friday, October 21st, 2011

RDFa 1.1 Lite

From the post:

Summary: RDFa 1.1 Lite is a simple subset of RDFa consisting of the following attributes: vocab, typeof, property, rel, about and prefix.

During the workshop, a proposal was put forth by RDFa’s resident hero, Ben Adida, for a stripped down version of RDFa 1.1, called RDFa 1.1 Lite. The RDFa syntax is often criticized as having too much functionality, leaving first-time authors confused about the more advanced features. This lighter version of RDFa will help authors easily jump into the Linked Data world. The goal was to create a very minimal subset that will work for 80% of the folks out there doing simple markup for things like search engines.

I was struck by the line “…that will work for 80% of the folks out there doing simple markup for things like search engines.”

OK, so instead of people authoring content for the web, the target of RDFa 1.1 Lite targets 80% of SEOs?

Targeting people who try to game search engine algorithms? Not a terribly sympathetic group.

HTML Data Task Force

Sunday, October 2nd, 2011

HTML Data Task Force, chaired by Jeni Tennison.

Another opportunity to participate in important work at the W3C without a membership. The “details” of getting diverse formats to work together.

Close analysis may show the need for changes to syntaxes, etc., but as far as mapping goes, topic maps can take syntaxes as they are. Could be an opportunity to demonstrate working solutions for actual use cases.

From the wikipage:

This HTML Data Task Force considers RDFa 1.1 and microdata as separate syntaxes, and conducts a technical analysis on the relationship between the two formats. The analysis discusses specific use cases and provide guidance on what format is best suited for what use cases. It further addresses the question how different formats can be used within the same document when required and how data expressed in the different formats can be combined by consumers.

The task force MAY propose modifications in the form of bug reports and change proposals on the microdata and/or RDFa specifications, to help users to easily transition between the two syntaxes or use them together. As with all such comments, the ultimate decisions on implementing these will rest with the respective Working Groups.

Further, the Task Force should also produce a draft specifications of mapping algorithms from an HTML+microdata content to RDF, as well as a mapping of RDFa to microdata’s JSON format. These MAY serve as input documents to possible future recommendation track works. These mappings should be, if possible, generic, i.e., they should not be dependent on any particular vocabulary. A goal for these mappings should be to facilitate the use of both formats with the same vocabularies without creating incompatibilities.

The Task Force will also consider design patterns for vocabularies, and provide guidance on how vocabularies should be shaped to be usable with both microdata and RDFa and potentially with microformats. These patterns MAY lead to change proposals of existing (RDF) vocabularies, and MAY result in general guidelines for the design of vocabularies for structured data on the web, building on existing community work in this area.

The Task Force liaises with the SWIG Web Schemas Task Force to ensure that lessons from real-world experience are incorporated into the Task Force recommendations and that any best practices described by the Task Force are synchronised with real-world practice.

The Task Force conducts its work through the mailing list (use this link to subscribe or look at the public archives), as well as on the #html-data-tf channel of the (public) W3C IRC server.

Microdata and RDFa Living Together in Harmony

Sunday, August 21st, 2011

Microdata and RDFa Living Together in Harmony by Jeni Tennison.

From the post:

One of the options that the TAG put forward when it asked the W3C to put together task force on embedded data in HTML was the co-existence of RDFa and microdata. If that’s what we’re headed for, what might make things easier for consumers and publishers who have to live in that world?

In a situation where there are two competing standards, I think that developers — both on the publication and consumption sides — are going to want to hedge their bets. They will want to avoid being tied to one syntax in case it turns out that that syntax isn’t supported by the majority of publishers/consumers in the long term and they have to switch.

Publishers like us at who are aiming to share their data to whoever is interested in it (rather than having a particular consumer in mind) are also likely to want to publish in both microdata and RDFa, rather than force potential consumers to adopt a particular processing model, and will therefore need to mix the syntaxes within their pages.

Interesting and detailed analysis of the issues of reconciling microdata and RDFa.

Jeni asks if this type of analysis is worthy of something more official than a blog post.

I would say yes. I think this sort of mapping analysis should be published along with any competing format.

You would not frequent a software project that lacks version control.

Why use a data format/annotation that doesn’t provide a mapping to “competing” formats? (The emphasis being on “competing” formats. Not mappings to any possible format but to those in direct competition with the proposed format/annotation system.)

I have no objection to new formats but if there is an existing format, document its shortcomings and a mapping to the new format, along with where the mapping fails.

Doesn’t save us from competing formats but it may ease the evaluation and/or co-existence of formats.

From a topic map perspective, such a mapping is just more grist for the mill.

RDFaCE WYSIWYM RDFa Content Editor

Friday, July 15th, 2011

RDFaCE WYSIWYM RDFa Content Editor

From the announcement:

RDFaCE is an online RDFa content editor based on TinyMCE. In addition to two classical views for text authoring (WYSIWYG and HTML source code), RDFaCE supports two novel views for semantic content authoring namely WYSIWYM (What You See Is What You Mean), which highlights semantic annotations directly inline and a triple view (aka. fact view). Further features are:

  • use of different Web APIs (, Sindice, Swoogle) to facilitate the semantic content authoring process.
  • combining of results from multiple NLP APIs (Alchemy, Extractive, Ontos, Evri, OpenCalais) for obtaining rich automatic semantic annotations that can be modified and extended later on.

This is very clever and a step forward for the Semantic Web.

STI Innsbruck

Wednesday, July 6th, 2011

STI Innsbruck

From the about page:

The Semantic Technology Institute (STI) Innsbruck, formerly known as DERI Innsbruck, was founded by Univ.-Prof. Dr. Dieter Fensel in 2002 and has developed into a challenging and dynamic research institute of approximately 40 people. STI Innsbruck collaborates with an international network of institutes in Asia, Europe and the USA, as well as with a number of global industrial partners.

STI Innsbruck is a founding member of STI International, a collaborative association of leading European and world wide initiatives, ensuring the success and sustainability of semantic technology development. STI Innsbruck utilizes this network, as well as contributing to it, in order to increase the impact of the research conducted within the institute. For more details on Semantics, check this interview with Frank Van Harmelen: “Search and you will find“.

I won’t try to summarize the wealth of resources you will find at STI Innsbruck. From the reading list for the curriculum to the listing of tools and publications, you will certainly find material of interest at this site.

For an optimistic view of Semantic Web activity see the interview with Frank Van Harelen.

Semantic Web Dog Food (There’s a fly in my

Monday, May 30th, 2011

Semantic Web Dog Food

From the website:

Welcome to the Semantic Web Conference Corpus – a.k.a. the Semantic Web Dog Food Corpus! Here you can browse and search information on papers that were presented, people who attended, and other things that have to do with the main conferences and workshops in the area of Semantic Web research.

We currently have information about

  • 2133 papers,
  • 5020 people and
  • 1273 organisations at
  • 20 conferences and
  • 132 workshops,

and a total of 126886 unique triples in our database!

The numbers looked low to me until I read in the FAQ:

This is not just a site for ISWC [International Semantic Web Conference] and ESWC [European Semantic Web Conference] though. We hope that, in time, other metadata sets relating to Semantic Web activity will be hosted here — additional bibliographic data, test sets, community ontologies and so on.

This illustrates a persistent problem of the Semantic Web. This site has one way to encode the semantics of these papers, people, conferences and workshops. Other sources of semantic data on these papers, people, conferences and workshops may well use other ways to encode those semantics. And every group has what it feels are compelling reasons for following its choices and not the choices of others. Assuming they are even aware of the choices of others. (Discovery being another problem but I won’t talk about that now.)

The previous semantic diversity of natural language is now represented by a semantic diversity of ontologies and URIs. Now our computers can more rapidly and reliably detect that we are using different vocabularies. The SW seems like a lot of work for such a result. Particularly since we continue to use diverse vocabularies and more diverse vocabularies continue to arise.

The SW solution, using OWL Full:

5.2.1 owl:sameAs

The built-in OWL property owl:sameAs links an individual to an individual. Such an owl:sameAs statement indicates that two URI references actually refer to the same thing: the individuals have the same “identity”.

For individuals such as “people” this notion is relatively easy to understand. For example, we could state that the following two URI references actually refer to the same person:

<rdf:Description rdf:about="#William_Jefferson_Clinton">
<owl:sameAs rdf:resource="#BillClinton"/>

The owl:sameAs statements are often used in defining mappings between ontologies. It is unrealistic to assume everybody will use the same name to refer to individuals. That would require some grand design, which is contrary to the spirit of the web.

In OWL Full, where a class can be treated as instances of (meta)classes, we can use the owl:sameAs construct to define class equality, thus indicating that two concepts have the same intensional meaning. An example:

<owl:Class rdf:ID="FootballTeam">
<owl:sameAs rdf:resource=""/>

One could imagine this axiom to be part of a European sports ontology. The two classes are treated here as individuals, in this case as instances of the class owl:Class. This allows us to state that the class FootballTeam in some European sports ontology denotes the same concept as the class SoccerTeam in some American sports ontology. Note the difference with the statement:

<footballTeam owl:equivalentClass us:soccerTeam />

which states that the two classes have the same class extension, but are not (necessarily) the same concepts.

Anyone see a problem? Other than requiring the use of OWL Full?

The absence of any basis for “…denotes the same concept as….?” I can’t safely reuse this axiom because I don’t know on what basis its author made such a claim. The URIs may provide further information that may satisfy me the axiom is correct but that still leaves me in the dark as to why the author of the axiom thought it to be correct. Overly precise for football/soccer ontologies you say but what of drug interaction ontologies? Or ontologies that govern highly sensitive intelligence data?

So we repeat semantic diversity, create maps to overcome the repeated semantic diversity and the maps we create have no explicit basis for the mappings they represent. Tell me again why this was a good idea?

RDFa API and RDFa 1.1 Primer Drafts Updated

Wednesday, April 20th, 2011

The RDF Web Applications Working Group has published new Working Drafts.


RDFa 1.1 Primer

Last Call: RDFa Core 1.1, XHTML+RDFa 1.1

Tuesday, April 12th, 2011

Last Call: RDFa Core 1.1, XHTML+RDFa 1.1

After posting the link to the slides on RDFa1.1 and R2ML, I went to the W3C website to check on the proposed revision of RDF (more on that later).

Anyway, I ran across the last call on RDFa Core 1.1, which reads in part:

The RDFa Working Group has published Last Call Working Drafts of RDFa Core 1.1 and XHTML+RDFa 1.1. The current Web is primarily made up of an enormous number of documents that have been created using HTML. These documents contain significant amounts of structured data, which is largely unavailable to tools and applications. When publishers can express this data more completely, and when tools can read it, a new world of user functionality becomes available, letting users transfer structured data between applications and web sites, and allowing browsing applications to improve the user experience.

But where it says: “These documents contain significant amounts of structured data, which is largely unavailable to tools and applications.”, that’s not really true.

Unless search and rendering engines have been doing a real good imitation of treating that structured data as though it were available.

What I think is meant is that the semantics of the structured data has not been specified, which is an entirely different question that making it available to tools and applications.

It is an important difference because as the experience with Linked Data shows, people have many different semantics that they associate with the same data.

When tools can read, or remain ignorant of, the many different semantics associated with the same data, what is this …new world of user functionality… that becomes available?

That’s the part that I am missing.

RDFa1.1 and R2ML

Tuesday, April 12th, 2011

RDFa1.1 and R2ML

A presentation from November, 2010 at the Benelux Semantic Web Meetup, Amsterdam by Ivan Herman of the W3C.

Nothing startling if you have kept up with RDF/RDFa work at the W3C but a good summary if you have not.