Archive for the ‘Description Logic’ Category

Analyzing Schema.org

Thursday, October 23rd, 2014

Analyzing Schema.org by Peter F. Patel-Schneider.

Abstract:

Schema.org is a way to add machine-understandable information to web pages that is processed by the major search engines to improve search performance. The definition of schema.org is provided as a set of web pages plus a partial mapping into RDF triples with unusual properties, and is incomplete in a number of places. This analysis of and formal semantics for schema.org provides a complete basis for a plausible version of what schema.org should be.

Peter’s analysis is summarized when he says:

The lack of a complete definition of schema.org limits the possibility of extracting the correct information from web pages that have schema.org markup.

Ah, yes, “…the correct information from web pages….”

I suspect the lack of semantic precision has powered the success of schema.org. Each user of schema.org markup has their private notion of the meaning of their use of the markup and there is no formal definition to disabuse them of that notion. Not that formal definitions were enough to save owl:sameAs from varying interpretations.

Schema.org empowers varying interpretations without requiring users to ignore OWL or description logic.

For the domains that schema.org covers, eateries, movies, bars, whore houses, etc., the semantic slippage permitted by schema.org lowers the bar to usage of its markup. Which has resulted in its adoption more widely than other proposals.

The lesson of schema.org is the degree of semantic slippage you can tolerate depends upon your domain. For pharmaceuticals, I would assume that degree of slippage is as close to zero as possible. For movie reviews, not so much.

Any effort to impose the same degree of semantic slippage across all domains is doomed to failure.

I first saw this in a tweet by Bob DuCharme.

Life Is Random: Biologists now realize that “nature vs. nurture” misses the importance of noise

Tuesday, September 16th, 2014

Life Is Random: Biologists now realize that “nature vs. nurture” misses the importance of noise by Cailin O’Connor.

From the post:

Is our behavior determined by genetics, or are we products of our environments? What matters more for the development of living things—internal factors or external ones? Biologists have been hotly debating these questions since shortly after the publication of Darwin’s theory of evolution by natural selection. Charles Darwin’s half-cousin Francis Galton was the first to try to understand this interplay between “nature and nurture” (a phrase he coined) by studying the development of twins.

But are nature and nurture the whole story? It seems not. Even identical twins brought up in similar environments won’t really be identical. They won’t have the same fingerprints. They’ll have different freckles and moles. Even complex traits such as intelligence and mental illness often vary between identical twins.

Of course, some of this variation is due to environmental factors. Even when identical twins are raised together, there are thousands of tiny differences in their developmental environments, from their position in the uterus to preschool teachers to junior prom dates.

But there is more to the story. There is a third factor, crucial to development and behavior, that biologists overlooked until just the past few decades: random noise.

In recent years, noise has become an extremely popular research topic in biology. Scientists have found that practically every process in cells is inherently, inescapably noisy. This is a consequence of basic chemistry. When molecules move around, they do so randomly. This means that cellular processes that require certain molecules to be in the right place at the right time depend on the whims of how molecules bump around. (bold emphasis added)

Is another word for “noise” chaos?

The sort of randomness that impacts our understanding of natural languages? That leads us to use different words for the same thing and the same word for different things?

The next time you see a semantically deterministic system be sure to ask if they have accounted for the impact of noise on the understanding of people using the system. 😉

To be fair, no system can but the pretense that noise doesn’t exist in some semantic environments (think description logic, RDF) is more than a little annoying.

You might want to start following the work of Cailin O’Connor (University of California, Irvine, Logic and Philosophy of Science).

Disclosure: I have always had a weakness for philosophy of science so your mileage may vary. This is real philosophy of science and not the strained crys of “science” you see on most mailing list discussions.

I first saw this in a tweet by John Horgan.

An Evidential Logic for Multi-Relational Networks

Monday, March 5th, 2012

An Evidential Logic for Multi-Relational Networks by Marko A. Rodriguez and Joe Geldart.

Slide presentation on description and evidential logic.

By the same title, see the article by these authors:

An Evidential Logic for Multi-Relational Networks

Abstract:

Multi-relational networks are used extensively to structure knowledge. Perhaps the most popular instance, due to the widespread adoption of the Semantic Web, is the Resource Description Framework (RDF). One of the primary purposes of a knowledge network is to reason; that is, to alter the topology of the network according to an algorithm that uses the existing topological structure as its input. There exist many such reasoning algorithms. With respect to the Semantic Web, the bivalent, monotonic reasoners of the RDF Schema (RDFS) and the Web Ontology Language (OWL) are the most prevalent. However, nothing prevents other forms of reasoning from existing in the Semantic Web. This article presents a non-bivalent, non-monotonic, evidential logic and reasoner that is an algebraic ring over a multi-relational network equipped with two binary operations that can be composed to execute various forms of inference. Given its multi-relational grounding, it is possible to use the presented evidential framework as another method for structuring knowledge and reasoning in the Semantic Web. The benefits of this framework are that it works with arbitrary, partial, and contradictory knowledge while, at the same time, it supports a tractable approximate reasoning process.

Of the two I would recommend the paper over the slides. Just a fuller presentation. (Despite having the same name, these could be represented as separate topics in a topic map.)

A Description Logic Primer

Sunday, January 22nd, 2012

A Description Logic Primer by Markus Krötzsch, Frantisek Simancik and Ian Horrocks.

Abstract:

This paper provides a self-contained first introduction to description logics (DLs). The main concepts and features are explained with examples before syntax and semantics of the DL SROIQ are defined in detail. Additional sections review light-weight DL languages, discuss the relationship to the Web Ontology Language OWL and give pointers to further reading.

It’s an introduction to description logics but it is also a readable introduction to description logics (DLs). And it will give you a good overview of the area.

As the paper points out, DLs are older than their use with web ontology languages but that is the use that you are most likely to encounter.

You won’t find anything new information here but it may be a good refresher.

Top Three Technologies to Tame the Big Data Beast

Sunday, November 27th, 2011

Top Three Technologies to Tame the Big Data Beast by Steve Hamby.

I would re-order some of Steve’s remarks. For example, on the Semantic Web, why not put those paragraphs first:

The first technology needed to tame Big Data — derived from the “memex” concept — is semantic technology, which loosely implements the concept of associative indexing. Dr. Bush is generally considered the godfather of hypertext based on the associative indexing concept, per his 1945 article. The Semantic Web, paraphrased from a definition by the World Wide Web Consortium (W3C), extends hyperlinked Web pages by adding machine-readable metadata about the Web page, including relationships across Web pages, thus allowing machine agents to process the hyperlinks automatically. The W3C provides a series of standards to implement the Semantic Web, such as Web Ontology Language (OWL), Resource Description Framework (RDF), Rule Interchange Format (RIF), and several others.

The May 2001 Scientific American article “The Semantic Web” by Tim Berners-Lee, Jim Hendler, and Ora Lassila described the Semantic Web as agents that query ontologies representing human knowledge to find information requested by a human. OWL ontology is based on Description Logics, which are both expressive and decidable, and provide a foundation for developing precise models about various domains of knowledge. These ontologies provide the “memory index” that enables searches across vast amounts of data to return relevant, actionable information, while addressing key data trust challenges as well. The ability to deliver semantics to a mobile device, such as what the recent release of the iPhone 4S does with Siri, is an excellent step in taming the Big Data beast, since users can get the data they need when and where they need it. Big Data continues to grow, but semantic technologies provide the needed check points to properly index vital information in methods that imitate the way humans think, as Dr. Bush aptly noted.

Follow that with the amount of data recitation and the comments about Vannevar Bush:

In the July 1945 issue of The Atlantic Monthly, Dr. Vannevar Bush’s famous essay, “As We May Think,” was published as one of the first articles addressing Big Data, information overload, or the “growing mountain of research” as stated in the article. The 2010 IOUG Database Growth Survey, conducted in July-August 2010, estimates that more than a zettabyte (or a trillion gigabytes) of data exists in databases, and that 16 percent of organizations surveyed reported a data growth rate in excess of 50 percent annually. A Gartner survey, also conducted in July-August 2010, reported that 47 percent of IT staffers surveyed ranked data growth as one of the top three challenges faced by their IT organization. Based on two recent IBM articles derived from their CIO Survey, one in three CIOs make decisions based on untrusted data; one in two feel they do not have the data they need to make an informed decision; and 83 percent cite better analytics as a top concern. A recent survey conducted for MarkLogic asserts that 35 percent of respondents believe their unstructured data sources will surpass their structured data sources in size in the next 36 months, while 86 percent of respondents claim that unstructured data is important to their organization. The survey further asserts that only 11 percent of those that consider unstructured data important have an infrastructure that addresses unstructured data.

Dr. Bush conceptualized a “private library,” coined “memex” (mem[ory ind]ex) in his essay, which could ingest the “mountain of research,” and use associative indexing — how we think — to correlate trusted data to support human decision making. Although Dr. Bush conceptualized “memex” as a desk-based device complete with levers, buttons, and a microfilm-based storage device, he recognized that future mechanisms and gadgetry would enhance the basic concepts. The core capabilities of “memex” were needed to allow man to “encompass the great record and to grow in the wisdom of race experience.”

That would allow exploration of questions and comments like:

1) With a zettabyte of data and more coming in every day, precisely how are we going to create/impose OWL ontologies to develop “…precise models about various domains of knowledge?”

2) Curious on what grounds hyperlinking is considered the equivalent of associative indexing? Hyperlinks can be used by indexes but hyperlinking isnt indexing. Wasn’t then, isn’t now.

3) The act of indexing is collecting references to a list of subjects. Imposing RDF/OWL may be preparatory steps towards indexing but are not indexing in and of themselves.

4) Description Logics are decidable but why does Steve think human knowledge can be expressed in decidable fashion? There is a vast amount of human knowledge in religion, philosophy, politics, ethics, economics, etc., that cannot be expressed in decidable fashion. Parking regulations can be expressed in decidable fashion, I think, but I don’t know if they are worth the trouble of RDF/OWL.

5) For that matter, where does Steve get the idea that human knowledge is precise? I suppose you could have made that argument in the 1890’s, except for some odd cases, classical physics was sufficient. At least until 1905. (Hint: Think of Albert Einstein.) Human knowledge is always provisional, uncertain and subject to revision. The CERN has apparently observed neutrinos going faster than the speed of light, for example. More revisions of physics are on the way.

Part of what we need to tame the big data “beast” is acceptance that we need information systems that are like ourselves.

That is to say information systems that are tolerant of imprecision, perhaps even inconsistency, that don’t offer a false sense of decidability and omniscience. Then at least we can talk about and recognize the parts of big data that remain to be tackled.