Archive for the ‘OWL’ Category

Analyzing Schema.org

Thursday, October 23rd, 2014

Analyzing Schema.org by Peter F. Patel-Schneider.

Abstract:

Schema.org is a way to add machine-understandable information to web pages that is processed by the major search engines to improve search performance. The definition of schema.org is provided as a set of web pages plus a partial mapping into RDF triples with unusual properties, and is incomplete in a number of places. This analysis of and formal semantics for schema.org provides a complete basis for a plausible version of what schema.org should be.

Peter’s analysis is summarized when he says:

The lack of a complete definition of schema.org limits the possibility of extracting the correct information from web pages that have schema.org markup.

Ah, yes, “…the correct information from web pages….”

I suspect the lack of semantic precision has powered the success of schema.org. Each user of schema.org markup has their private notion of the meaning of their use of the markup and there is no formal definition to disabuse them of that notion. Not that formal definitions were enough to save owl:sameAs from varying interpretations.

Schema.org empowers varying interpretations without requiring users to ignore OWL or description logic.

For the domains that schema.org covers, eateries, movies, bars, whore houses, etc., the semantic slippage permitted by schema.org lowers the bar to usage of its markup. Which has resulted in its adoption more widely than other proposals.

The lesson of schema.org is the degree of semantic slippage you can tolerate depends upon your domain. For pharmaceuticals, I would assume that degree of slippage is as close to zero as possible. For movie reviews, not so much.

Any effort to impose the same degree of semantic slippage across all domains is doomed to failure.

I first saw this in a tweet by Bob DuCharme.

Web Annotation Working Group Charter

Wednesday, July 23rd, 2014

Web Annotation Working Group Charter

From the webpage:

Annotating, which is the act of creating associations between distinct pieces of information, is a widespread activity online in many guises but currently lacks a structured approach. Web citizens make comments about online resources using either tools built into the hosting web site, external web services, or the functionality of an annotation client. Readers of ebooks make use the tools provided by reading systems to add and share their thoughts or highlight portions of texts. Comments about photos on Flickr, videos on YouTube, audio tracks on SoundCloud, people’s posts on Facebook, or mentions of resources on Twitter could all be considered to be annotations associated with the resource being discussed.

The possibility of annotation is essential for many application areas. For example, it is standard practice for students to mark up their printed textbooks when familiarizing themselves with new materials; the ability to do the same with electronic materials (e.g., books, journal articles, or infographics) is crucial for the advancement of e-learning. Submissions of manuscripts for publication by trade publishers or scientific journals undergo review cycles involving authors and editors or peer reviewers; although the end result of this publishing process usually involves Web formats (HTML, XML, etc.), the lack of proper annotation facilities for the Web platform makes this process unnecessarily complex and time consuming. Communities developing specifications jointly, and published, eventually, on the Web, need to annotate the documents they produce to improve the efficiency of their communication.

There is a large number of closed and proprietary web-based “sticky note” and annotation systems offering annotation facilities on the Web or as part of ebook reading systems. A common complaint about these is that the user-created annotations cannot be shared, reused in another environment, archived, and so on, due to a proprietary nature of the environments where they were created. Security and privacy are also issues where annotation systems should meet user expectations.

Additionally, there are the related topics of comments and footnotes, which do not yet have standardized solutions, and which might benefit from some of the groundwork on annotations.

The goal of this Working Group is to provide an open approach for annotation, making it possible for browsers, reading systems, JavaScript libraries, and other tools, to develop an annotation ecosystem where users have access to their annotations from various environments, can share those annotations, can archive them, and use them how they wish.

Depending on how fine grained you want your semantics, annotation is one way to convey them to others.

Unfortunately, looking at the starting point for this working group, “open” means RDF, OWL and other non-commercially adopted technologies from the W3C.

Defining the ability to point, using XQuery perhaps and reserving to users the ability to create standards for annotation payloads would be a much more “open” approach. That is an approach you are unlikely to see from the W3C.

I would be more than happy to be proven wrong on that point.

VOWL: Visual Notation for OWL Ontologies

Friday, April 18th, 2014

VOWL: Visual Notation for OWL Ontologies

Abstract:

The Visual Notation for OWL Ontologies (VOWL) defines a visual language for the user-oriented representation of ontologies. It provides graphical depictions for elements of the Web Ontology Language (OWL) that are combined to a force-directed graph layout visualizing the ontology.

This specification focuses on the visualization of the ontology schema (i.e. the classes, properties and datatypes, sometimes called TBox), while it also includes recommendations on how to depict individuals and data values (the ABox). Familiarity with OWL and other Semantic Web technologies is required to understand this specification.

At the end of the specification there is an interesting example but as a “force-directed graph layout” it captures one of the difficulties I have with that approach.

I have this unreasonable notion that a node I select and place in the display should stay where I have placed it, not shift about because I have moved some other node. Quite annoying and I don’t find it helpful at all.

I first saw this at: VOWL: Visual Notation for OWL Ontologies

Brain: … [Topic Naming Constraint Reappears]

Wednesday, April 24th, 2013

Brain: biomedical knowledge manipulation by Samuel Croset, John P. Overington and Dietrich Rebholz-Schuhmann. (Bioinformatics (2013) 29 (9): 1238-1239. doi: 10.1093/bioinformatics/btt109)

Abstract:

Summary: Brain is a Java software library facilitating the manipulation and creation of ontologies and knowledge bases represented with the Web Ontology Language (OWL).

Availability and implementation: The Java source code and the library are freely available at https://github.com/loopasam/Brain and on the Maven Central repository (GroupId: uk.ac.ebi.brain). The documentation is available at https://github.com/loopasam/Brain/wiki.

Contact: croset@ebi.ac.uk

Supplementary information: Supplementary data are available at Bioinformatics online.

Odd how things like the topic naming constraint show up in unexpected contexts. 😉

This article may be helpful if you are required to create or read OWL based data.

But as I read the article I saw:

The names (short forms) of OWL entities handled by a Brain object have to be unique. It is for instance not possible to add an OWL class, such as http://www.example.org/Cell to the ontology if an OWL entity with the short form ‘Cell’ already exists.

The explanation?

Despite being in contradiction with some Semantic Web principles, this design prevents ambiguous queries and hides as much as possible the cumbersome interaction with prefixes and Internationalized Resource Identifiers (IRI).

I suppose but doesn’t ambiguity exist in the mind of the user? That is they use a term than can have more than one meaning?

Having unique terms simply means inventing odd terms that no user will know.

Rather than unambiguous isn’t that unfound?

Simple Web Semantics – Index Post

Monday, February 18th, 2013

Sam Hunting suggested that I add indexes to the Simple Web Semantics posts to facilitate navigating from one to the other.

It occurred to me that having a single index page could also be useful.

The series began with:

Reasoning why something isn’t working is important to know before proposing a solution.

I have gotten good editorial feedback on the proposal and will be posting a revision in the next couple of days.

Nothing substantially different but clearer and more precise.

If you have any comments or suggestions, please make them at your earliest convenience.

I am always open to comments but the sooner they arrive the sooner I can make improvements.

Simple Web Semantics (SWS) – Syntax Refinement

Sunday, February 17th, 2013

In Saving the “Semantic” Web (part 5), the only additional HTML syntax I proposed was:

<meta name=”dictionary” content=”URI”>

in the <head> element of an HTML document.

(Where you would locate the equivalent declaration of a URI dictionary in other document formats will vary.)

But that sets the URI dictionary for an entire document.

What if you want more fine grained control over the URI dictionary for a particular URI?

It would be possible to do something complicated with namespaces, containers, scope, etc. but the simpler solution would be:

<a dictionary="URI" href="yourURI">

Either the URI is governed by the declaration for the entire page or it has a declared dictionary URI.

Or to summarize the HTML syntax of SWS at this point:

<meta name=”dictionary” content=”URI”>

<a dictionary="URI" href="yourURI">

Saving the “Semantic” Web (part 3)

Tuesday, February 12th, 2013

On Semantic Transparency

The first responder to this series of posts, j22, argues the logic of the Semantic Web has been found to be useful.

I said as much in my post and stand by that position.

The difficulty is that the “logic” of the Semantic Web excludes vast swathes of human expression and the people who would make those expressions.

If you need authority for that proposition, consider George Boole (An Investigation of the Laws of Thought, pp. 327-328):

But the very same class of considerations shows with equal force the error of those who regard the study of Mathematics, and of their applications, as a sufficient basis either of knowledge or of discipline. If the constitution of the material frame is mathematical, it is not merely so. If the mind, in its capacity of formal reasoning, obeys, whether consciously or unconsciously, mathematical laws, it claims through its other capacities of sentiment and action, through its perceptions of beauty and of moral fitness, through its deep springs of emotion and affection, to hold relation to a different order of things. There is, moreover, a breadth of intellectual vision, a power of sympathy with truth in all its forms and manifestations, which is not measured by the force and subtlety of the dialectic faculty. Even the revelation of the material universe in its boundless magnitude, and pervading order, and constancy of law, is not necessarily the most fully apprehended by him who has traced with minutest accuracy the steps of the great demonstration. And if we embrace in our survey the interests and duties of life, how little do any processes of mere ratiocination enable us to comprehend the weightier questions which they present! As truly, therefore, as the cultivation of the mathematical or deductive faculty is a part of intellectual discipline, so truly is it only a part. The prejudice which would either banish or make supreme any one department of knowledge or faculty of mind, betrays not only error of judgment, but a defect of that intellectual modesty which is inseparable from a pure devotion to truth. It assumes the office of criticising a constitution of things which no human appointment has established, or can annul. It sets aside the ancient and just conception of truth as one though manifold. Much of this error, as actually existent among us, seems due to the special and isolated character of scientific teaching—which character it, in its turn, tends to foster. The study of philosophy, notwithstanding a few marked instances of exception, has failed to keep pace with the advance of the several departments of knowledge, whose mutual relations it is its province to determine. It is impossible, however, not to contemplate the particular evil in question as part of a larger system, and connect it with the too prevalent view of knowledge as a merely secular thing, and with the undue predominance, already adverted to, of those motives, legitimate within their proper limits, which are founded upon a regard to its secular advantages. In the extreme case it is not difficult to see that the continued operation of such motives, uncontrolled by any higher principles of action, uncorrected by the personal influence of superior minds, must tend to lower the standard of thought in reference to the objects of knowledge, and to render void and ineffectual whatsoever elements of a noble faith may still survive.

Or Justice Holmes writing in 1881 (The Common Law, page 1)

The life of the law has not been logic: it has been experience. The felt necessities of the time, the prevalent moral and political theories, intuitions of public policy, avowed or unconscious, even the prejudices which judges share with their fellow-men, have had a good deal more to do than the syllogism in determining the rules by which men should be governed. The law embodies the story of a nation’s development through many centuries, and it cannot be dealt with as if it contained only the axioms and corollaries of a book of mathematics.

In terms of historical context, remember that Holmes is writing at a time when works like John Stuart Mill’s A System of Logic, Ratiocinative and Inductive: being a connected view of The Principles of Evidence and the Methods of Scientific Investigation, were in high fashion.

The Semantic Web isn’t the first time “logic” has been seized upon as useful (as no doubt it is) and exclusionary (the part I object to) of other approaches.

Rather than presuming the semantic monotone the Semantic Web needs for its logic, a false presumption for owl:sameAs and no doubt other subjects, why not empower users to use more complex identifiers for subjects than solitary URIs?

It would not take anything away from the current Semantic Web infrastructure, simply makes its basis, URIs, less semantically opaque to users.

Isn’t semantic transparency a good thing?


The Semantic Web Is Failing — But Why? (Part 5)

Thursday, February 7th, 2013

Impoverished Identification by URI

There is one final part of the faliure of the Semantic Web puzzle to explore before we can talk about a solution.

In owl:sameAs and Linked Data: An Empircal Study, Ding, Shinavier, Finin and McGuinness write:

Our experimental results have led us to identify several issues involving the owl:sameAs property as it is used in practice in a linked data context. These include how best to manage owl:sameAs assertions from “third parties”, problems in merging assertions from sources with different contexts, and the need to explore an operational semantics distinct from the strict logical meaning provided by OWL.

To resolve varying usages of owl:sameAs, the authors go beyond identifications provided by a URI to look to other properties. For example:

Many owl:sameAs statements are asserted due to the equivalence of the primary feature of resource description, e.g. the URIs of FOAF profiles of a person may be linked just because they refer to the same person even if the URIs refer the person at different ages. The odd mashup on job-title in previous section is a good example for why the URIs in different FOAF profiles are not fully equivalent. Therefore, the empirical usage of owl:sameAs only captures the equivalence semantics on the projection of the URI on social entity dimension (removing the time and space dimensions). In thisway, owl:sameAs is used to indicate p artial equivalence between two different URIs, which should not be considered as full equivalence.

Knowing the dimensions covered by a URI and the dimensions covered by a property, it is possible to conduct better data integration using owl:sameAs. For example, since we know a URI of a person provides a temporal-spatial identity, descriptions using time-sensitive properties, e.g. age, height and workplace, should not be aggregated, while time-insensitive properties, such as eye color and social security number, may be aggregated in most cases.

When an identification is insufficient based on a single URI, additional properties can be considered.

My question then is why do ordinary users have to wait for experts to decide their identifications are insufficient? Why can’t we empower users to declare multiple properties, including URIs, as a means of identification?

It could be something as simple as JSON key/value pairs with a notation of “+” for must match, “-” for must not match, and “?” for optional to match.

A declaration of identity by users about the subjects in their documents. Who better to ask?

Not to mention that the more information supplies with for an identification, the more likely they are to communicate, successfully, with other users.

URIs may be Tim Berners-Lee’s nails, but they are insufficient to support the scaffolding required for robust communication.


The next series starts with Saving the “Semantic” Web (Part 1)

The Semantic Web Is Failing — But Why? (Part 1)

Thursday, February 7th, 2013

Introduction

Before proposing yet another method for identification and annotation of entities in digital media, it is important to draw lessons from existing systems. Failing systems in particular, so their mistakes are not repeated or compounded. The Semantic Web is an example of such a system.

Doubters of that claim should the report Additional Statistics and Analysis of the Web Data Commons August 2012 Corpus by Web Data Commons.

Web Data Commons is a structured data research project based at the Research Group Data and Web Science at the University of Mannheim and the Institute AIFB at the Karlsruhe Institute of Technology. Supported by PlanetData and LOD2 research projects, the Web Data Commons is not opposed to the Semantic Web.

But the Additional Statistics and Analysis of the Web Data Commons August 2012 Corpus document reports:

Altogether we discovered structured data within 369 million of the 3 billion pages contained in the Common Crawl corpus (12.3%). The pages containing structured data originate from 2.29 million among the 40.5 million websites (PLDs) contained in the corpus (5.65%). Approximately 519 thousand websites use RDFa, while only 140 thousand websites use Microdata. Microformats are used on 1.7 million websites. It is interesting to see that Microformats are used by approximately 2.5 times as many websites as RDFa and Microdata together. (emphasis added)

To sharpen the point, RDFa is 1.28% of the 40.5 million websites, eight (8) years after its introduction (2004) and four (4) years after reaching Recommendation status (2008).

Or more generally:

Parsed HTML URLs 3,005,629,093
URLs with Triples 369,254,196

On in a layperson’s terms, for this web corpus, parsed HTML URLs outnumber URLs with Triples between approximately eight to one.

Being mindful that the corpus is only web accessible data and excludes “dark data,” the need for a more robust solution that the Semantic Web is self-evident.

The failure of the Semantic Web is no assurance that any alternative proposal will fare better. Understanding why the Semantic Web is failing is a prerequisite to any successful alternative.


Before you “flame on,” you might want to read the entire series. I end up with a suggestion based on work by Ding, Shinavier, Finin and McGuinness.


The next series starts with Saving the “Semantic” Web (Part 1)

Chinese Rock Music

Tuesday, January 15th, 2013

Experiences on semantifying a Mediawiki for the biggest recource about Chinese rock music: rockinchina .com by René Pickhardt.

From the post:

During my trip in China I was visiting Beijing on two weekends and Maceau on another weekend. These trips have been mainly motivated to meet old friends. Especially the heads behind the biggest English resource of Chinese Rock music Rock in China who are Max-Leonhard von Schaper and the founder of the biggest Chinese Rock Print Magazin Yang Yu. After looking at their wiki which is pure gold in terms of content but consists mainly of plain text I introduced them the idea of putting semantics inside the project. While consulting them a little bit and pointing them to the right resources Max did basically the entire work (by taking a one month holiday from his job. Boy this is passion!).

I am very happy to anounce that the data of rock in china is published as linked open data and the process of semantifying the website is in great shape. In the following you can read about Max experiences doing the work. This is particularly interesting because Max has no scientific background in semantic technologies. So we can learn a lot on how to improve these technologies to be ready to be used by everybody:

Good to see that René hasn’t lost his touch for long blog titles. 😉

A very valuable lesson in the difficulties posed by current “semantic” technologies.

Max and company succeed, but only after heroic efforts.

Reasoning with the Variation Ontology using Apache Jena #OWL #RDF

Monday, August 27th, 2012

Reasoning with the Variation Ontology using Apache Jena #OWL #RDF by Pierre Lindenbaum.

From the post:

The Variation Ontology (VariO), “is an ontology for standardized, systematic description of effects, consequences and mechanisms of variations”.

In this post I will use the Apache Jena library for RDF to load this ontology. It will then be used to extract a set of variations that are a sub-class of a given class of Variation.

If you are interested in this example, you may also be interested in the Variation Ontology.

The VariO homepage reports:

VariO allows

  • consistent naming
  • annotation of variation effects
  • data integration
  • comparison of variations and datasets
  • statistical studies
  • development of sofware tools

It isn’t clear on a quick read, how VariO accomplishes:

  • data integration
  • comparison of variations and datasets

Unless it means uniform recordation using VariO enables “data integration,” and “comparison of variations and datasets?”

True but what nomenclature, uniformly used, does not enable “data integration,” and “comparison of variations and datasets?”

Is there one?

Stardog 1.0

Sunday, June 24th, 2012

Stardog 1.0 by Kendall Clark.

From the post:

Today I’m happy to announce the release of Stardog 1.0, the fastest, smartest, and easiest to use RDF database on the planet. Stardog fills a hole in the Semantic Technology (and NoSQL database) market for an RDF database that is fast, zero config, lightweight, and feature-rich.

Speed Kills

RDF and OWL are excellent technologies for building data integration and analysis apps. Those apps invariably require complex query processing, i.e., queries where there are lots of joins, complex logical conditions to evaluate, etc. Stardog is targeted at query performance for complex SPARQL queries. We publish performance data so you can see how we’re doing.

Braindead Simple Deployment

Winners ship. Period.

We care very much about simple deployments. Stardog works out-of-the-box with minimal (none, typically) configuration. You shouldn’t have to fight an RDF database for days to install or tune it for great performance. Because Stardog is pure Java, it will run anywhere. It just works and it’s damn fast. You shouldn’t need to buy and configure a cluster of machines to get blazing fast performance from an RDF database. And now you don’t have to.

One More Thing…OWL Reasoning

Finally, Stardog has the deepest, most comprehensive, and best OWL reasoning support of any commerical RDF database available.

Stardog 1.0 supports RDFS, OWL 2 QL, EL, and RL, as well as OWL 2 DL schema-reasoning. It’s also the only RDF database to support closed-world integrity constraint validation and automatic explanations of integrity constraint violations.

If you care about data quality, Stardog 1.0 is worth a hard look.

OK, so I have signed up for an evaluation version, key, etc. Email just arrived.

Downloaded software and license key.

With all the open data laying around, should not be hard to find test data.

More to follow. Comments welcome.

Is There A Dictionary In The House? (Savanna – Think Software)

Thursday, April 12th, 2012

Reading a white paper on an integration solution from Thetus Corporation (on its Savanna product line) when I encountered:

Savanna supports the core architectural premise that the integration of external services and components is an essential element of any enterprise platform by providing out-of-the-box integrations with many of the technologies and programs already in use in the DI2E framework. These investments include existing programs, such as: the Intelligence Community Data Layer (ICDL), OPTIC (force protection application), WATCHDOG (Terrorist Watchlist 2.0), SERENGETI (AFRICOM socio-cultural analysis), SCAN-R (EUCOM deep futures analysis); and, in the future: TAC (tripwire search and analysis), and HSCB-funded modeling capabilities, including Signature Analyst and others. To further make use of existing external services and components, the proposed solution includes integration points for commercial and opensource software, including: SOLR (indexing), Open Sextant (geotagging), Apache OpenNLP (entity extraction), R (statistical analysis), ESRI (geo-processing), OpenSGI GeoCache (geospatial data), i2 Analyst’s Notebook (charting and analysis) and a variety of structured and unstructured data repositories.

I have to plead ignorance of the “existing program” alphabet soup but I am familiar with several of the open source packages.

I am not sure what an “integration point” for an unknown future use of any of those packages would look like. Do you? Their output can be used by any program but that hardly qualifies the other program as having an “integration point.”

I am sensitive to the use of “integration” because to me it means there is some basis for integration. So a user having integrated data once, can re-use and possibly enhance the basis for integration of data with other data. (We call that “merging” in topic map land.)

Integration and even reuse is mentioned: “The Savanna architecture prevents creating a set of comparable reuse issues at the enterprise scale by providing a set of interconnected and flexible models that articulate how analysis assets are sourced and created and how they are used by the community.” (page 16)

But not in enough detail to really evaluate the basis for re-use of data, data structures, enrichment of the same, etc.

Looked around for an SDK or such but came up empty.

Point of amusement:

It’s official, we’re debuting our newest release of Savanna at DoDIIS (March 21, 2012) (Department of Defense Intelligence Information Systems Worldwide Conference (DoDIIS))

The next blog entry by date?

Happy Peaceful Birthday to the Peace Corps (March 1, 2012)

I would appreciate hearing from anyone with information or stories to tell about how Savanna works in practice.

In particular I am interested in whether two distinct Savanna installations can share information in a blind interchange? That should be the test of re-use of information by another installation.

Moreover, do I have to convert data between formats or can data structures themselves be entities with properties?

PS: I am not overly impressed with the use of OWL for modeling in Savanna. The experience with “big data” has shown that starting with data first leads to different, perhaps more useful models than the other way around.

Premature modeling with OWL will result in models that are “useful” in meeting the expectations of the creating analyst. That may not be the criteria of “usefulness” that is required.

Wyner and Hoekstra on A Legal Case OWL Ontology with an Instantiation of Popov v. Hayashi

Friday, March 16th, 2012

Wyner and Hoekstra on A Legal Case OWL Ontology with an Instantiation of Popov v. Hayashi

From Legalinformatics:

Dr. Adam Wyner of the University of Leeds Centre for Digital Citizenship and Dr. Rinke Hoekstra of the University of Amsterdam’s Leibniz Center for Law have published A legal case OWL ontology with an instantiation of Popov v. Hayashi, forthcoming in Artificial Intelligence and Law. Here is the abstract:

The legal case ontology here.

I have a history with logic and the law that stretches over decades. Rather than comment now, interested in what you think? What is strong/weak about this proposal?

MapReduceXMT

Saturday, March 3rd, 2012

MapReduceXMT from Sandia National Laboratories.

From the webpage:

Welcome to MapReduceXMT

MapReduceXMT is a library that ports the MapReduce paradigm to the Cray XMT.

MapReduceXMT is copyrighted and released under a Berkeley open source license. However, the code is still very much in development and there has not been a formal release of the software.

SPEED-MT Semantic Processing Executed Efficiently and Dynamically

This trac site is currently being used to house SPEED-MT, which contains a set of algorithms and data structures for processing semantic web data on the Cray XMT.

SPEED-MT Modules

  • Dictionary Encoding
  • Decoding
  • RDFS/OWL Closure
  • RDF Stats
  • RDF Dedup

OK, so this one is tied a little more closely to the Cray XMT. 😉

But modules are ones that are likely to be of interest for processing RDF triples/quads.

This was cited in “High-performance Computing Applied to Semantic Databases” article that I covered in Is That A Graph In Your Cray?

OWL: Yet to Arrive on the Web of Data?

Wednesday, February 15th, 2012

OWL: Yet to Arrive on the Web of Data? by Angela Guess.

From the post:

A new paper is currently available for download entitled OWL: Yet to arrive on the Web of Data? The paper was written by Birte Glimm, Aidan Hogan, Markus Krötzsch, and Axel Polleres. The abstract states, “Seven years on from OWL becoming a W3C recommendation, and two years on from the more recent OWL 2 W3C recommendation, OWL has still experienced only patchy uptake on the Web. Although certain OWL features (like owl:sameAs) are very popular, other features of OWL are largely neglected by publishers in the Linked Data world.”

It continues, “This may suggest that despite the promise of easy implementations and the proposal of tractable profiles suggested in OWL’s second version, there is still no “right” standard fragment for the Linked Data community. In this paper, we (1) analyse uptake of OWL on the Web of Data, (2) gain insights into the OWL fragment that is actually used/usable on the Web, where we arrive at the conclusion that this fragment is likely to be a simplified profile based on OWL RL, (3) propose and discuss such a new fragment, which we call OWL LD (for Linked Data).”

Interesting and perhaps valuable data about the use of RDFS/OWL primitives on the Web.

I find it curious that the authors don’t survey users about what OWL capabilities they would find compelling. It could be that users are interested in and willing to support some subset of OWL that hasn’t been considered by the authors or others.

Might not be the Semantic Web as the authors envision it, but without broad user support, the author’s Semantic Web will never come to pass.

Semantic Web – Sweet Spot(s) and ‘Gold Standards’

Monday, January 23rd, 2012

Mike Bergman posted a two-part series on how to make the Semantic Web work:

Seeking a Semantic Web Sweet Spot

In Search of ‘Gold Standards’ for the Semantic Web

Both are worth your time to read but the second sets the bar for “Gold Standards” for the Semantic Web as:

The need for gold standards for the semantic Web is particularly acute. First, by definition, the scope of the semantic Web is all things and all concepts and all entities. Second, because it embraces human knowledge, it also embraces all human languages with the nuances and varieties thereof. There is an immense gulf in referenceability from the starting languages of the semantic Web in RDF, RDFS and OWL to this full scope. This gulf is chiefly one of vocabulary (or lack thereof). We know how to construct our grammars, but we have few words with understood relationships between them to put in the slots.

The types of gold standards useful to the semantic Web are similar to those useful to our analogy of human languages. We need guidance on structure (syntax and grammar), plus reference vocabularies that encompass the scope of the semantic Web (that is, everything). Like human languages, the vocabulary references should have analogs to dictionaries, thesauri and encyclopedias. We want our references to deal with the specific demands of the semantic Web in capturing the lexical basis of human languages and the connectedness (or not) of things. We also want bases by which all of this information can be related to different human languages.

To capture these criteria, then, I submit we should consider a basic starting set of gold standards:

  • RDF/RDFS/OWL — the data model and basic building blocks for the languages
  • Wikipedia — the standard reference vocabulary of things, concepts and entities, plus other structural guidances
  • WordNet — lexical language references as an aid to natural language processing, and
  • UMBEL — the structural reference for the connectedness of things for basic coherence and inference, plus a vocabulary for mapping amongst reference structures and things.

Each of these potential gold standards is next discussed in turn. The majority of discussion centers on Wikipedia and UMBEL.

There is one criteria that Mike leaves out: Choice of a majority of users.

Use by a majority of users is a sweet spot that brooks no argument.

Advice regarding future directions for Protégé

Friday, September 30th, 2011

Advice regarding future directions for Protégé

Mark Munsen, Principal Investigator, The Protégé Project, posted the following request to the protege-users mailing list:

I am writing to seek your advice regarding future directions for the Protégé Project. As you know, all the work that we perform on the Protégé suite of tools is supported by external funding, nearly all from federal research grants. We currently are seeking additional grant support to migrate some of the features that are available in Protégé Version 3 to Protégé Version 4. We believe that this migration is important, as only Protégé 4 supports the full OWL 2 standard, and we appreciate that many members of our user community are asking to use certain capabilities currently unique to Protégé 3 with OWL 2 ontologies in Protégé 4.

To help the Protégé team in setting priorities, and to help us make the case to our potential funders that enhancement of Protégé 4 is warranted, we’d be grateful if you could please fill out the brief survey at the following URL:

http://www.surveymonkey.com/s/ProtegeDirections

It will not take more than a few minutes for you to give us feedback that will be influential in setting our future goals. If we can document strong community support for implementing certain Protégé 3 features in Protégé 4, then we will be in a much stronger position to make the case to our funders to initiate the required work.

The entire Protégé team is looking forward to your opinions. Please be sure to forward this message to colleagues who use Protégé who may not subscribe to these mailing lists so that we can obtain as much feedback as possible.

Many thanks for your help and support.

Please participate in this survey (there are only 7 questions, one of which is optional) and ask others to participate as well.

QUDT – Quantities, Units, Dimensions and Data Types in OWL and XML

Monday, September 12th, 2011

QUDT – Quantities, Units, Dimensions and Data Types in OWL and XML

From background:

The QUDT Ontologies, and derived XML Vocabularies, are being developed by TopQuadrant and NASA. Originally, they were developed for the NASA Exploration Initiatives Ontology Models (NExIOM) project, a Constellation Program initiative at the AMES Research Center (ARC). The goals of the QUDT ontology are twofold:

  • to provide a unified model of, measurable quantities, units for measuring different kinds of quantities, the numerical values of quantities in different units of measure and the data structures and data types used to store and manipulate these objects in software;
  • to populate the model with the instance data (quantities, units, quantity values, etc.) required to meet the life-cycle needs of the Constellation Program engineering community.

If you are looking for measurements, this would be one place to start.

Semantic Web Journal – Vol. 2, Number 2 / 2011

Wednesday, August 31st, 2011

Semantic Web Journal – Vol. 2, Number 2 / 2011

Just in case you want to send someone the link to a particular article:

Semantic Web surveys and applications
DOI 10.3233/SW-2011-0047 Authors Pascal Hitzler and Krzysztof Janowicz

Taking flight with OWL2
DOI 10.3233/SW-2011-0048 Author Michel Dumontier

Comparison of reasoners for large ontologies in the OWL 2 EL profile
DOI 10.3233/SW-2011-0034 Authors Kathrin Dentler, Ronald Cornet, Annette ten Teije and Nicolette de Keizer

Approaches to visualising Linked Data: A survey
DOI 10.3233/SW-2011-0037 Authors Aba-Sah Dadzie and Matthew Rowe

Is Question Answering fit for the Semantic Web?: A survey
DOI 10.3233/SW-2011-0041 Authors Vanessa Lopez, Victoria Uren, Marta Sabou and Enrico Motta

FactForge: A fast track to the Web of data
DOI 10.3233/SW-2011-0040 Authors Barry Bishop, Atanas Kiryakov, Damyan Ognyanov, Ivan Peikov, Zdravko Tashev and Ruslan Velkov

GDB for the Data Driven Age (STI Summit Position Paper)

Saturday, July 30th, 2011

GDB for the Data Driven Age (STI Summit Position Paper) by Orri Erling.

From the post:

The Semantic Technology Institute (STI) is organizing a meeting around the questions of making semantic technology deliver on its promise. We were asked to present a position paper (reproduced below). This is another recap of our position on making graph databasing come of age. While the database technology matters are getting tackled, we are drawing closer to the question of deciding actually what kind of inference will be needed close to the data. My personal wish is to use this summit for clarifying exactly what is needed from the database in order to extract value from the data explosion. We have a good idea of what to do with queries but what is the exact requirement for transformation and alignment of schema and identifiers? What is the actual use case of inference, OWL or other, in this? It is time to get very concrete in terms of applications. We expect a mixed requirement but it is time to look closely at the details.

Interesting post that includes the following observation:

Real-world problems are however harder than just bundling properties, classes, or instances into sets of interchangeable equivalents, which is all we have mentioned thus far. There are differences of modeling (“address as many columns in customer table” vs. “address normalized away under a contact entity”), normalization (“first name” and “last name” as one or more properties; national conventions on person names; tags as comma-separated in a string or as a one-to-many), incomplete data (one customer table has family income bracket, the other does not), diversity in units of measurement (Imperial vs. metric), variability in the definition of units (seven different things all called blood pressure), variability in unit conversions (currency exchange rates), to name a few. What a world!

Yes, quite.

Worth a very close read.

STI Innsbruck

Wednesday, July 6th, 2011

STI Innsbruck

From the about page:

The Semantic Technology Institute (STI) Innsbruck, formerly known as DERI Innsbruck, was founded by Univ.-Prof. Dr. Dieter Fensel in 2002 and has developed into a challenging and dynamic research institute of approximately 40 people. STI Innsbruck collaborates with an international network of institutes in Asia, Europe and the USA, as well as with a number of global industrial partners.

STI Innsbruck is a founding member of STI International, a collaborative association of leading European and world wide initiatives, ensuring the success and sustainability of semantic technology development. STI Innsbruck utilizes this network, as well as contributing to it, in order to increase the impact of the research conducted within the institute. For more details on Semantics, check this interview with Frank Van Harmelen: “Search and you will find“.

I won’t try to summarize the wealth of resources you will find at STI Innsbruck. From the reading list for the curriculum to the listing of tools and publications, you will certainly find material of interest at this site.

For an optimistic view of Semantic Web activity see the interview with Frank Van Harelen.

Joint International Semantic Technology
Conference (JIST2011)

Wednesday, July 6th, 2011

Joint International Semantic Technology Conference (JIST2011) Dec. 4-7, 2011, Hangzhou, China

Important Dates:


– Submissions due: August 15, 2011, 23:59 (11:59pm) Hawaii time

– Notification: September 22, 2011, 23:59 (11:59pm) Hawaii time

– Camera ready: October 3, 2011, 23:59 (11:59pm) Hawaii time

– Conference dates: December 4-7, 2011

From the call:

The Joint International Semantic Technology Conference (JIST) is a regional federation of Semantic Web related conferences. The mission of JIST is to bring together researchers in disciplines related to the semantic technology from across the Asia-Pacific Region. JIST 2011 incorporates the Asian Semantic Web Conference 2011 (ASWC 2011) and Chinese Semantic Web Conference 2011 (CSWC 2011).

Prof. Ian Horrocks (Oxford University) scheduled to present a keynote address.

Providing and discovering definitions of URIs

Wednesday, June 29th, 2011

Providing and discovering definitions of URIs by Jonathan A. Rees.

Abstract:

The specification governing Uniform Resource Identifiers (URIs) [rfc3986] allows URIs to mean anything at all, and this unbounded flexibility is exploited in a variety contexts, notably the Semantic Web and Linked Data. To use a URI to mean something, an agent (a) selects a URI, (b) provides a definition of the URI in a manner that permits discovery by agents who encounter the URI, and (c) uses the URI. Subsequently other agents may not only understand the URI (by discovering and consulting the definition) but may also use the URI themselves.

A few widely known methods are in use to help agents provide and discover URI definitions, including RDF fragment identifier resolution and the HTTP 303 redirect. Difficulties in using these methods have led to a search for new methods that are easier to deploy, and perform better, than the established ones. However, some of the proposed methods introduce new problems, such as incompatible changes to the way metadata is written. This report brings together in one place information on current and proposed practices, with analysis of benefits and shortcomings of each.

The purpose of this report is not to make recommendations but rather to initiate a discussion that might lead to consensus on the use of current and/or new methods.

The criteria for success:

  1. Simple. Having too many options or too many things to remember makes discovery fragile and impedes uptake.
  2. Easy to deploy on Web hosting services. Uptake of linked data depends on the technology being accessible to as many Web publishers as possible, so should not require control over Web server behavior that is not provided by typical hosting services.
  3. Easy to deploy using existing Web client stacks. Discovery should employ a widely deployed network protocol in order to avoid the need to deploy new protocol stacks.
  4. Efficient. Accessing a definition should require at most one network round trip, and definitions should be cacheable.
  5. Browser-friendly. It should be possible to configure a URI that has a discoverable definition so that ‘browsing’ to it yields information useful to a human.
  6. Compatible with Web architecture. A URI should have a single agreed meaning globally, whether it’s used as a protocol element, hyperlink, or name.

.

I had to look it up to get the page number but I remembered Karl Wiegers in Software Requirements saying:

Feasible

It must be possible to implement each requirement within the known capabilities and limitations of the system and its environment.

The single agreed meaning globally, whether it’s used as a protocol element, hyperlink, or name requirement is not feasible. It will stymie this project, despite the array of talent on hand, until it is no longer a requirement.

Need proof? Name one URI with a single agreed meaning globally, whether it’s used as a protocol element, hyperlink, or name.

Not one that the W3C TAG, or TBL or anyone else thinks/wants/prays has a single agree meaning globally, … but one that in fact has such a global meaning.

It’s been more than ten years. Let’s drop the last requirement and let the rather talented group working on this come up with a solution that meets the other five (5) requirements.

It won’t be a universal solution but then neither is the WWW.

LarKC: The Large Knowledge Collider

Wednesday, June 29th, 2011

LarKC: The Large Knowledge Collider

A tweet about a video on LarKC sent me looking for the project. From the webpage:

The aim of the EU FP 7 Large-Scale Integrating Project LarKC is to develop the Large Knowledge Collider (LarKC, for short, pronounced “lark”), a platform for massive distributed incomplete reasoning that will remove the scalability barriers of currently existing reasoning systems for the Semantic Web.

This will be achieved by:

  • Enriching the current logic-based Semantic Web reasoning methods with methods from information retrieval, machine learning, information theory, databases, and probabilistic reasoning,
  • Employing cognitively inspired approaches and techniques such as spreading activation, focus of attention, reinforcement, habituation, relevance reasoning, and bounded rationality.
  • Building a distributed reasoning platform and realizing it both on a high-performance computing cluster and via “computing at home”.

Listening to the video while writing this post but did I hear correctly that data would have to be transformed into a uniform format or vocabulary? Was listening to: http://videolectures.net/larkcag09_vanharmelen_llkc/, try around time mark 12:00 and following.

I also noticed on the project homepage:


Start: 01-April-08
End: 30-Sep-11
Duration 42 months

So, what happens to LarKC on 1-Oct-11?

Why Schema.org Will Win

Monday, June 13th, 2011

It isn’t hard to see why schema.org is going to win out over “other” semantic web efforts.

The first paragraph at the schema.org website says why:

This site provides a collection of schemas, i.e., html tags, that webmasters can use to markup their pages in ways recognized by major search providers. Search engines including Bing, Google and Yahoo! rely on this markup to improve the display of search results, making it easier for people to find the right web pages.

  • Easy: Uses HTML tags
  • Immediate Utility: Recognized by Bing, Google and Yahoo!
  • Immediate Payoff: People can find the right web pages (your web pages)

Ironic that when HTML came up the scene, any number of hypertext engines offered more complex and useful approaches to hypertext.

But the advantages of HTML were:

  • Easy: Used simple tags
  • Immediate Utility: Useful to the author
  • Immediate Payoff: Joins hypertext network for others to find (your web pages)

I think the third advantage in each case is the crucial one. We are vain enough that making our information more findable is a real incentive, if there is a reasonable expectation of it being found. Today or tomorrow. Not ten years from now.

Linking Science and Semantics… (webinar)
15 June 2011 – 10 AM PT (17:00 GMT)

Monday, June 13th, 2011

Linking science and semantics with the Annotation Ontology and the SWAN Annotation Tool

Abstract:

The Annotation Ontology (AO) is an open ontology in OWL for annotating scientific documents on the web. AO supports both human and algorithmic content annotation. It enables “stand-off” or independent metadata anchored to specific positions in a web document by any one of several methods. In AO, the document may be annotated but is not required to be under update control of the annotator. AO contains a provenance model to support versioning, and a set model for specifying groups and containers of annotation.

The SWAN Annotation Tool, recently renamed DOMEO (Document Metadata Exchange Organizer), is an extensible web application enabling users to visually and efficiently create and share ontology-based stand-off annotation metadata on HTML or XML document targets, using the Annotation Ontology RDF model. The tool supports manual, fully automated, and semi-automated annotation with complete provenance records, as well as personal or community annotation with access authorization and control.
[AO] http://code.google.com/p/annotation-ontology

I’m interested in how “stand-off” annotation is being handled, being an overlapping markup person myself. Also curious how close it comes to HyTime like mechanisms.

More after the webinar.

Knoodl!

Wednesday, June 8th, 2011

Knoodl!

From the webpage:

Knoodl is a tool for creating, managing, and analyzing RDF/OWL descriptions. Its many features support collaboration in all stages of these activities. Knoodl’s key component is a semantic software architecture that supports Emergent Analytics. Knoodl is hosted in the Amazon EC2 cloud and can be used for free. It may also be licensed for private use as MyKnoodl.

Mapping to or between the components of RDF/OWL descriptions as subjects will require analysis. Or simply use of RDF/OWL descriptions. In either case this could be a useful tool.

Semantic Web Dog Food (There’s a fly in my
bowl.)

Monday, May 30th, 2011

Semantic Web Dog Food

From the website:

Welcome to the Semantic Web Conference Corpus – a.k.a. the Semantic Web Dog Food Corpus! Here you can browse and search information on papers that were presented, people who attended, and other things that have to do with the main conferences and workshops in the area of Semantic Web research.

We currently have information about

  • 2133 papers,
  • 5020 people and
  • 1273 organisations at
  • 20 conferences and
  • 132 workshops,

and a total of 126886 unique triples in our database!

The numbers looked low to me until I read in the FAQ:

This is not just a site for ISWC [International Semantic Web Conference] and ESWC [European Semantic Web Conference] though. We hope that, in time, other metadata sets relating to Semantic Web activity will be hosted here — additional bibliographic data, test sets, community ontologies and so on.

This illustrates a persistent problem of the Semantic Web. This site has one way to encode the semantics of these papers, people, conferences and workshops. Other sources of semantic data on these papers, people, conferences and workshops may well use other ways to encode those semantics. And every group has what it feels are compelling reasons for following its choices and not the choices of others. Assuming they are even aware of the choices of others. (Discovery being another problem but I won’t talk about that now.)

The previous semantic diversity of natural language is now represented by a semantic diversity of ontologies and URIs. Now our computers can more rapidly and reliably detect that we are using different vocabularies. The SW seems like a lot of work for such a result. Particularly since we continue to use diverse vocabularies and more diverse vocabularies continue to arise.

The SW solution, using OWL Full:

5.2.1 owl:sameAs

The built-in OWL property owl:sameAs links an individual to an individual. Such an owl:sameAs statement indicates that two URI references actually refer to the same thing: the individuals have the same “identity”.

For individuals such as “people” this notion is relatively easy to understand. For example, we could state that the following two URI references actually refer to the same person:

<rdf:Description rdf:about="#William_Jefferson_Clinton">
<owl:sameAs rdf:resource="#BillClinton"/>
</rdf:Description>

The owl:sameAs statements are often used in defining mappings between ontologies. It is unrealistic to assume everybody will use the same name to refer to individuals. That would require some grand design, which is contrary to the spirit of the web.

In OWL Full, where a class can be treated as instances of (meta)classes, we can use the owl:sameAs construct to define class equality, thus indicating that two concepts have the same intensional meaning. An example:

<owl:Class rdf:ID="FootballTeam">
<owl:sameAs rdf:resource="http://sports.org/US#SoccerTeam"/>
</owl:Class>

One could imagine this axiom to be part of a European sports ontology. The two classes are treated here as individuals, in this case as instances of the class owl:Class. This allows us to state that the class FootballTeam in some European sports ontology denotes the same concept as the class SoccerTeam in some American sports ontology. Note the difference with the statement:

<footballTeam owl:equivalentClass us:soccerTeam />

which states that the two classes have the same class extension, but are not (necessarily) the same concepts.

Anyone see a problem? Other than requiring the use of OWL Full?

The absence of any basis for “…denotes the same concept as….?” I can’t safely reuse this axiom because I don’t know on what basis its author made such a claim. The URIs may provide further information that may satisfy me the axiom is correct but that still leaves me in the dark as to why the author of the axiom thought it to be correct. Overly precise for football/soccer ontologies you say but what of drug interaction ontologies? Or ontologies that govern highly sensitive intelligence data?

So we repeat semantic diversity, create maps to overcome the repeated semantic diversity and the maps we create have no explicit basis for the mappings they represent. Tell me again why this was a good idea?

Easy Semantic Solution Is At Hand! – Post

Thursday, January 27th, 2011

The Federated Enterprise (Using Semantic Technology Standards to Federate Information and to Enable Emergent Analytics)

I had to shorten the title a bit. 😉

Wanted you to be aware of the sort of nonsense that data warehouse people are being told:

The procedure described above enabling federation based on semantic technology is not hard to build; it is just a different way of describing things that people in your enterprise are already describing using incompatible technologies like spreadsheets, text processors, diagramming tools, modeling tools, email, etc. The semantic approach simply requires that everything be described in a single technology, RDF/OWL. This simple change in how things are described enables federation and the paradigm shifting capabilities that accompany it.

Gee, why didn’t we think about that? A single technology to describe everything.

Shakespeare would call this …a tale told by an idiot….

Just thought you could start the day with a bit of amusement.

*****
PS: It’s not the fault of RDF or OWL that people say stupid things about them.