Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

January 11, 2011

1st International Workshop on Semantic
Publication (SePublica 2011)

Filed under: Conferences,Ontology,OWL,RDF,Semantic Web,SPARQL — Patrick Durusau @ 7:24 pm

1st International Workshop on Semantic Publication (SePublica 2011) in connection with 8th Extended Semantic Web Conference (ESWC 2011), May 29th or 30th, Hersonissos, Crete, Greece.

From the Call for Papers:

The CHALLENGE of the Semantic Web is to allow the Web to move from a dissemination platform to an interactive platform for networked information. The Semantic Web promises to “fundamentally change our experience of the Web”.

In spite of improvements in the distribution, accessibility and retrieval of information, little has changed in the publishing industry so far. The Web has succeeded as a dissemination platform for scientific and non-scientific papers, news, and communication in general; however, most of that information remains locked up in discrete documents, which are poorly interconnected to one another and to the Web.

The connectivity tissues provided by RDF technology and the Social Web have barely made an impact on scientific communication nor on ebook publishing, neither on the format of publications, nor on repositories and digital libraries. The worst problem is in accessing and reusing the computable data which the literature represents and describes.

No, I am not going to say that topic maps are the magic bullet that will solve all those issues or the ones listed in their Questions and Topics of Interest.

What I do think topic maps bring to the table is an awareness that semantic interoperability isn’t primarily a format or computational problem.

Every new (and impliedly universal) format or model simply compounds the semantic interoperability problem.

By creating yet more formats and/or models between which semantic interoperability has to be designed.

Starting with the question of what subjects need to be identified and how they are identified now could lead to a viable, local semantic interoperability solution.

What more could a client want?

Local semantic interoperability solutions can form the basis for spreading semantic interoperability, one solution at a time.

*****
PS: Forgot the important dates:

Paper/Demo Submission Deadline: February 28, 23:59 Hawaii Time

Acceptance Notification: April 1

Camera Ready Version: April 15

SePublica Workshop: May 29 or May 30 (to be announced)

December 19, 2010

OWL, Ontologies, Formats, Punch Cards, Oh My!

Filed under: Linked Data,OWL,Semantic Web,Topic Maps — Patrick Durusau @ 2:03 pm

Edwin Black’s IBM and the Holocaust reports that one aspect of the use of IBM punch card technology by the Nazis (and others) was the monopoly that IBM maintained on the manufacture of the punch cards.

The IBM machines could only use IBM punch cards.

The IBM machines could only use IBM punch cards.

The repetition was intentional. Think about that statement in a more modern context.

When we talk about Linked Data, or OWL, or Cyc, or SUMO, etc. (yes, I am aware that I am mixing formats and ontologies), isn’t that the same thing?

They are not physical monopolies like IBM punch cards but rather are intellectual monopolies.

Say it this way (insert your favorite format/ontology) or you don’t get to play.

I am sure that meets the needs of software designed to work on with particular formats or ontologies.

But that isn’t the same thing as representing user semantics.

Note: Representing user semantics.

Not semantics as seen by the W3C or SUMO or Cyc or (insert your favorite group) or even XTM Topic Maps.

All of those quite usefully represent some user semantics.

None of them represent all user semantics.

No, I am not going to argue there is a non-monopoly solution.

To successfully integrate (or even represent) data, choices have to be made and those will result in a monopoly.

My caution it is to not mistake the lip of the teacup that is your monopoly for the horizon of the world.

Very different things.

*****
PS: Economic analysis of monopolies could be useful when discussing intellectual monopolies. The “products” are freely available but the practices have other characteristics of monopolies. (I have added a couple of antitrust books to my Amazon.com wish list should anyone feel moved to contribute.)

December 17, 2010

CFP – Dealing with the Messiness of the Web of Data – Journal of Web Semantics

CFP – Dealing with the Messiness of the Web of Data – Journal of Web Semantics

From the call:

Research on the Semantic Web, which is now in its second decade, has had a tremendous success in encouraging people to publish data on the Web in structured, linked, and standardized ways. The success of what has now become the Web of Data can be read from the sheer number of triples available within the Linked-Open Data, Linked Life Data and Open-Government initiatives. However, this growth in data makes many of the established assumptions inappropriate and offers a number of new research challenges.

In stark contrast to early Semantic Web applications that dealt with small, hand-crafted ontologies and data-sets, the new Web of Data comes with a plethora of contradicting world-views and contains incomplete, inconsistent, incorrect, fast-changing and opinionated information. This information not only comes from academic sources and trustworthy institutions, but is often community built, scraped or translated.

In short: the Web of Data is messy, and methods to deal with this messiness are paramount for its future.

Now, we have two choices as the topic map community:

  • congratulate ourselves for seeing this problem long ago, high five each other, etc., or
  • step up and offer topic map solutions that incorporate as much of the existing SW work as possible.

I strongly suggest the second one.

Important dates:

We will aim at an efficient publication cycle in order to guarantee prompt availability of the published results. We will review papers on a rolling basis as they are submitted and explicitly encourage submissions well before the submission deadline. Submit papers online at the journal’s Elsevier Web site.

Submission deadline: 1 February 2011
Author notification: 15 June 2011

Revisions submitted: 1 August 2011
Final decisions: 15 September 2011
Publication: 1 January 2012

December 8, 2010

Semantic Web – Journal Issue 1/1-2

Filed under: OWL,RDF,Semantic Web — Patrick Durusau @ 8:18 pm

Semantic Web

The first issue of Semantic Web is openly viewable and now online.

In their introductory remarks the editors focus in part on the journal’s subtitle:

The journal’s subtitle – Interoperability, Usability, Applicability – re?ects the wide scope of the journal, by putting an emphasis on enabling new technologies and methods. Interoperability refers to aspects such as the seamless integration of data from heterogeneous sources, on-the-?y composition and interoperation of Web services, and next-generation search engines. Usability encompasses new information retrieval paradigms, user interfaces and interaction, and visualization techniques, which in turn require methods for dealing with context dependency, personalization, trust, and provenance, amongst others, while hiding the underlying computational issues from the user. Applicability refers to the rapidly growing application areas of Semantic Web technologies and methods, to the issue of bringing state-of-the-art research results to bear on real-world applications, and to the development of new methods and foundations driven by real application needs from various domains.

Skimming the table of contents I can see lots of opportunity for comments and rejoinders.

For the present I simply commend this new journal and its contents to you for your reading pleasure.

December 3, 2010

Declared Instance Inferences (DI2)? (RDF, OWL, Semantic Web)

Filed under: Inference,OWL,RDF,Semantic Web,Subject Identity — Patrick Durusau @ 8:49 am

In recent discussions of identity, I have seen statements that OWL reasoners could infer that two or more representatives stood for the same subject.

That’s useful but I wondered if the inferencing overhead is necessary in all in such cases?

If a user recognizes that a subject representative (a subject proxy in topic map terms) represents the same subject as another representative, a declarative statement avoids the need for artificial inferencing.

I am sure there are cases where inferencing is useful, particularly to suggest inferences to users, but declared inferences could reduce that need and the overhead.

Declarative information artifacts could be created that contain rules for known identifications.

For example, gene names found in PubMed. If two or more names are declared to refer to the same gene, where is the need for inferencing?

With such declarations in place, no reasoner has to “infer” anything about those names.

Declared instance inferences (DI2) reduce semantic dissonance, inferencing overhead and uncertainty.

Looks like a win-win situation to me.

*****
PS: It occurs to me that ontologies are also “declared instance inferences” upon which artificial reasoners rely. The instances happen to be classes and not individuals.

November 26, 2010

Scalable reduction of large datasets to interesting subsets

Filed under: OWL,RDF,Semantic Web — Patrick Durusau @ 11:04 am

Scalable reduction of large datasets to interesting subsets Authors: Gregory Todd Williams, Jesse Weaver, Medha Atre, James A. Hendler Keywords: Billion Triples Challenge, Scalability, Parallel, Inferencing, Query, Triplestore

Abstract:

With a huge amount of RDF data available on the web, the ability to find and access relevant information is crucial. Traditional approaches to storing, querying, and reasoning fall short when faced with web-scale data. We present a system that combines the computational power of large clusters for enabling large-scale reasoning and data access with an efficient data structure for storing and querying the accessed data on a traditional personal computer or other resource-constrained device. We present results of using this system to load the 2009 Billion Triples Challenge dataset, materialize RDFS inferences, extract an “interesting” subset of the data using a large cluster, and further analyze the extracted data using a personal computer, all in the order of tens of minutes.

I wonder about the use of the phrase “…web-scale data?”

if a billion triples is a real challenge, then what happens when RDF/RDFa is deployed across an entity and inference rich body of material like legal texts? Or property descriptions? Or the ownership rights based on property descriptions?

In any event, the prep of the data for inferencing illustrates a use case for topic maps:

Information about people is represented in different ways in the BTC2009 dataset, including the use of the FOAF,7 SIOC,8 DBpedia,9 and AKT10 ontologies. We create a simple upper ontology to bring together concepts and properties pertaining to people. For example, we define the class up:Person which is defined as a superclass to existing person classes, e.g., foaf:Person. We do the same for relevant properties, e.g., up:full name is a superproperty of akt:full-name. Note that “up” is the namespace prefix for our upper ontology.

What subject represented by akt:full-name was responsible for the mapping in question? How does that translate to other ontologies? Oh, sorry, no place to record that mapping.

Questions:

  1. How do you evaluate the claims of “…web-scale data?” (3-5 pages, citations)
  2. Does creating ad-hoc upper ontologies scale? Yes/No/Why? (3-5 pages, citations)
  3. How does interchanges of ad-hoc uppper ontologies work? (3-5 pages, citations)

Managing Terabytes of Web Semantics Data

Filed under: OWL,RDF,Semantic Web — Patrick Durusau @ 11:00 am

Managing Terabytes of Web Semantics Data Authors: Michele Catasta, Renaud Delbru, Nickolai Toupikov, and Giovanni Tummarello

Abstract:

A large amount of semi structured data is now made available on the Web in form of RDF, RDFa and Microformats. In this chapter, we discuss a general model for the Web of Data and, based on our experience in Sindice.com, we discuss how this is reflected in the architecture and components of a large scale infrastructure. Aspects such as data collection, processing, indexing, ranking are touched, and we give an ample example of an applications built on top of said infrastructure.

Appears as Chapter 6 in R. De Virgilio et al. (eds.), Semantic Web Information Management, © Springer-Verlag Berlin Heidelberg 2010.

Hopefully not too repetitious with the other Sindice.com material I have been posting.

It is a good overview of the area, in addition to specifics about Sindice.com.

Semantic Now?

Filed under: Navigation,OWL,RDF,Semantic Web,Topic Maps — Patrick Durusau @ 10:58 am

Visit Semantic Web, then return here (or use a separate browser window).

I went to the Semantic Web page of the W3C looking for a prior presentation and was struck by the semantic now nature of the page.

It isn’t clear how to access older material.

I have to confess to having only a passing interest in self-promotional, puff pieces, including logos.

I assume that is true for many of the competent researchers working with the W3C. (There are a lot of them, this is not a criticism of their work.)

So, where is the interface that enables quick access to substantial materials, including older standards, statements and presentations?

*****
I understand at least some of the W3C site is described in RDF. What degree of detail, precision, I don’t know. Would make a starting point for a topic map of the site.

The other necessary component and where this page falls down, would be a useful navigation choices. That would be the harder problem.

Let me know if you are interested in cracking this nut.

Another Take on the Semantic Web?

Filed under: OWL,RDF,Semantic Web — Patrick Durusau @ 10:56 am

Bob Ferris constructs a take on the SW at: On Resources, Information Resources and Documents.

Whatever you think of Bob’s vision of the SW, the fundamental problem is one of requiring universal use of a flat identifier (URI).

Which leaves us with string comparison. Different string, different thing being identified.

Some of the better SW software now evaluates RDF graphs for identification of entities.

Not all that different from how we identify entities.

Departs from the URI = Identifier basis of the SW, but to be useful, that was inevitable.

Two more challenges face the SW (where topic maps can help, there are others):

1) How to communicate to other users what parts of an RDF graph to match for identity purposes? (including matching on subparts)

2) How to communicate to other users when non-Isomorphic RDF graphs are semantically equivalent?

More on those issues anon.

November 16, 2010

In Defense of Ambiguity

Filed under: OWL,RDF,Semantic Web,Subject Identity — Patrick Durusau @ 5:49 pm

by Patrick J. Hayes and Harry Halpin was cited in David Booth’s article so like any academic, I had to go read the cited paper. 😉

Highly recommended.

The authors conclude:

Regardless of the details, the use of any technology in Web architecture to distinguish between access and reference, including our proposed ex:refersTo and ex:describedBy, does nothing more than allow the author of a URI to explain how they would like the URI to be used. Ultimately, there is nothing that Web architecture can do to prevent a URI from being used to refer to some thing non-accessible. However, at least having a clear and coherent device, such as a few RDF predicates, would allow the distinction to be made so the author could give guidance on what they believe best practice for their URI would be. This would vastly improve the situation from where it is today, where this distinction is impossible. The philosophical case for the distinction between reference and access is clear. The main advantage of Web architecture is that there is now a de facto universal identification scheme for accessing networked resources. With the Semantic Web, we can now extend this scheme to the wide world outside the Web by use of reference. By keeping the distinction between reference and access clear, the lemons of ambiguity can be turned into lemonade. Reference is inherently ambiguous, and ambiguity is not an error of communication, but fundamental to the success of communication both on and off the Web.

Sounds like the distinction between subject locators and identifiers that topic maps made long before this paper was written.

Resource Identity and Semantic Extensions: Making Sense of Ambiguity

Filed under: OWL,RDF,Semantic Web,Subject Identity — Patrick Durusau @ 5:29 pm

Resource Identity and Semantic Extensions: Making Sense of Ambiguity David Booth’s paper was cited by Bernard Vatant so I had to go take a look.

Bernard says: “The best analysis of the issue I’ve read so far.” I have to agree.

From the paper’s conclusion:

In general, a URI’s resource identity will necessarily be ambiguous. But this is not the end of the world. Rather, it means that while it may be unambiguous enough for one application, another application may require finer distinctions and thus consider it ambiguous. However, this ambiguity of resource identity can be precisely constrained by the use of URI declarations. Finally, a standard process is proposed for determining a URI’s resource identity.

Ambiguity is part and parcel of any system and the real question is how much can you tolerate?

For some systems that is quite a bit, for others, air traffic controllers come to mind, as little as possible.

Other identifiers are ambiguous as well.

Successful integration of data across systems depends on how well we deal with that ambiguity.

November 8, 2010

BibBase and Beyond

Filed under: BibTeX,OWL,RDF,Semantic Web — Patrick Durusau @ 8:38 am

BibBase is an effort to store BibTeX information as RDF triples. For the data, see: BibBase data.

As of 8 November 2010, there are 6178 publications.

Interesting I suppose but the real question is how to enable researchers using BibTeX to disambiguate their terminology as part of their BibTeX entry?

Has to be as easy as BibTeX and consistent with usage patterns in the communities that use it. If you hope for adoption.

Not hard to imagine a helper application that runs through a set of BibTeX entries and suggest 1998 ACM Computing Classification System or 2010 Mathematics Subject Classification entries. Entries which the author could accept or reject.

Not the fine grained, concept by concept (read subject by subject) analysis of a document that I would like to see, but it’s a start.

October 20, 2010

8th Extended Semantic Web Conference: May 29 – June 2 2011 Heraklion, Greece

Filed under: Conferences,Ontology,OWL,Semantic Web,Semantics,SPARQL — Patrick Durusau @ 3:15 am

8th Extended Semantic Web Conference: May 29 – June 2 2011 Heraklion, Greece

Important Dates

See ESWC 2010 for range of content.

October 8, 2010

Semantic Drift and Linked Data/Semantic Web

Filed under: Linked Data,OWL,Semantic Web,Subject Identity — Patrick Durusau @ 10:28 am

Overloading OWL sameAs starts with:

Description: General Issue: owl:sameAs is being used in the linked data community in a way that is inconsistent with its semantics.

Read the document but in summary: People use OWL sameAs to mean different things.

I don’t see how their usage can be “inconsistent with its semantics.”

Words don’t possess self-executing semantics that bind us. Rather the other way round I think.

If OWL sameAs had some “original” semantic, it changed by the process of semantic drift.

Semantic drift is where the semantics of a token changes over time or across communities due to its use by people.

URIs or tokens may be “stable,” but the evidence is that the semantics of URIs or tokens are not.

The question is how to manage changing, emerging, drifting semantics? (Not a question answered by a static semantic model of URI based identity.)

PS: RDF researchers have recognized semantic drift and have proposed solutions for addressing it. More on that anon.

Questions:

  • Select a classification more than 30 years old and randomly select one book for each 5 year period for the last 30 years. What (if any) semantic drift do you see in the use of this classification?
  • Exchange your list with a classmate. Do you agree/disagree with their evaluation? Why?
  • Repeat the exercise in #1 and #2 but use a classification where you can find books between 30 and 60 years ago. Select one book per 5 year period.

July 14, 2010

Coreference via substitution rules – Post

Filed under: Mapping,OWL — Patrick Durusau @ 11:16 am

Coreference via substitution rules by Bernard Vatant develops two interesting notions:

  • Using substitution to test interchange of references
  • Using operational rules rather than declarative assertions

See his post for the full details.

He uses context to define when one reference could be substituted for another.

Also observes that any mapping, such as owl:sameAs can be abused.

As with many things, semantic integration may not be as much a technical issue but a human one. Semantic integration tools aren’t going to lead to semantic integration unless we use them with semantic integration as a goal.

July 5, 2010

Closed World vs. Open World: the First Semantic Web Battle – From Stefano’s Linotype

Filed under: OWL,RDF,Semantic Web — Patrick Durusau @ 7:20 pm

Closed World vs. Open World: the First Semantic Web Battle from Stefano’s Linotype is well worth your time.

See also Stack or Two Towers. Seems like one universal world view may not be a robust as previously thought.

Interesting that non-universal treatment of “doubt” may split the Semantic Web into incompatible parts. Can you say fragile?.

June 4, 2010

I do not think it means what you think it means

Filed under: Ontology,OWL,RDF,Semantic Web,Software — Patrick Durusau @ 4:30 am

I do not think it means what you think it means by Taylor Cowan is a deeply amusing take on Pellet, an OWL 2 Reasoner for Java.

I particularly liked the line:

I believe the semantic web community is falling into the same trap that the AI community fell into, which is to grossly underestimate the meaning of “reason”. As Inigo Montoya says in the Princess Bride, “You keep using that word. I do not think it means what you think it means.”

(For an extra 5 points, what is the word?)

Taylor’s point that Pellet will underscore unstated assumptions in an ontology and make sure that your ontology is consistent is a good one. If you are writing an ontology to support inferences that is a good thing.

Topic maps can support “consistent” ontologies but I find encouragement in their support for how people actually view the world as well. That some people “logically” infer from Boeing 767 -> “means of transportation” should not prevent me from capturing that some people “logically” infer -> “air-to-ground weapon.”

A formal reasoning system could be extended to include that case, but can that be done as soon as an analyst has that insight or must it be carefully crafted and tested to fit into a reasoning system when “the lights are blinking red?”

« Newer Posts

Powered by WordPress