Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

February 21, 2011

Topic Maps and the Semantic Web

Filed under: Semantic Web,Topic Maps — Patrick Durusau @ 6:56 am

As I pointed out in Topic Maps in < 5 Minutes, topic maps take it as given that:

salt

sodium chloride

NaCl

are all legitimate identifiers for the same subject.

That is called semantic diversity. (There are more ways to identify salt but let’s stick with those for the moment.)

Contrast that with the Semantic Web, that wants to have one identifier for salt and consequently, no semantic diversity.

You may ask yourself, what happens to all the previous identifications of salt, literally thousands of different ways to identify it?

Do we have to re-write all those identifiers across the vast sweep of human literature?

Or you may ask yourself, given the conditions that lead to semantic diversity still exist, how is future semantic diversity to be avoided?

Good questions.

Or for that matter, the changing information structures and growing information structures where we are storing petabytes of data. What about semantic diversity there as well?

I don’t know.

Maybe we should ask Tim Berners-Lee, timbl @ w3.org?

*****
PS: And this is a really easy subject to identify. Just think about democracy, human rights, justice, or any of the others of thousands of subjects, all of which are works in progress.

February 18, 2011

Linked Data-a-thon – ISWC 2011

Filed under: Conferences,Linked Data,Marketing,Semantic Web — Patrick Durusau @ 5:17 am

Linked Data-a-thon http://iswc2011.semanticweb.org/calls/linked-data-a-thon/

I looked at the requirements for the Linked Data-a-thon, which include:

  • make use of Linked Data consumed from multiple data sources
  • be able to make use of additional data from other Linked Data sources
  • be accessible from the Web
  • satisfy the special requirement which will be announced on October 1, 2011.

It would not be hard to fashion a topic map application that consumed Linked Data, made use of additional data from other Linked Data sources and was accessible from the Web.

What would be interesting would be to reliably integrate other information sources, that were not Linked Data with Linked Data sources.

Don’t know about the special requirement.

One person in a team of people would actually have to be attending the conference to enter.

Anyone interested in discussing such a entry?

Suggested Team title: Linked Data Cake (1 Tsp Linked Data, 8 Cups Non-Linked Data, TM Oven – Set to Merge)

Kinda long and pushy but why not?

What better marketing pitch for topic maps than to leverage present investments in Linked Data into a meaningful result with non-linked data.

It isn’t like there is a shortage of non-linked data to choose from. 😉

10th International Semantic Web Conference (ISWC 2011) -Call for Papers

Filed under: Conferences,Ontology,Semantic Web — Patrick Durusau @ 5:15 am

10th International Semantic Web Conference (ISWC 2011) – Call for Papers

The 10th International Semantic Web Conference (ISWC 2011) will be in Bonn, Germany, Otober 23-27, 2011.

From the call:

Key Topics

  • Management of Semantic Web Data
  • Natural Language Processing
  • Ontologies and Semantics
  • Semantic Web Engineering
  • Social Semantic Web
  • User Interfaces to the Semantic Web
  • Applications of the Semantic Web

Tracks and Due Dates:

Research Papers
http://iswc2011.semanticweb.org/calls/research-papers/

Semantic Web In Use
http://iswc2011.semanticweb.org/calls/semantic-web-in-use/

Posters and Demos
http://iswc2011.semanticweb.org/calls/posters-and-demos/

Doctoral Consortium
http://iswc2011.semanticweb.org/calls/doctoral-consortium/

Tutorials http://iswc2011.semanticweb.org/calls/tutorials/

Workshops http://iswc2011.semanticweb.org/calls/workshops/

Semantic Web Challenge http://iswc2011.semanticweb.org/calls/semantic-web-challenge/

Linked Data-a-thon
http://iswc2011.semanticweb.org/calls/linked-data-a-thon/

February 11, 2011

Sowa on Watson

Filed under: Cyc,Ontology,Semantic Web,Subject Identifiers,Subject Identity — Patrick Durusau @ 6:43 am

John Sowa’s posting on Watson merits reproduction in its entirety (lite editing to make it format for easy reading):

Peter,

Thanks for the reminder:

Dave Ferrucci gave a talk on UIMA (the Unstructured Information Management Architecture) back in May-2006, entitled: “Putting the Semantics in the Semantic Web: An overview of UIMA and its role in Accelerating the Semantic Revolution”

I recommend that readers compare Ferrucci’s talk about UIMA in 2006 with his talk about the Watson system and Jeopardy in 2011. In less than 5 years, they built Watson on the UIMA foundation, which contained a reasonable amount of NLP tools, a modest ontology, and some useful tools for knowledge acquisition. During that time, they added quite a bit of machine learning, reasoning, statistics, and heuristics. But most of all, they added terabytes of documents.

For the record, following are Ferrucci’s slides from 2006:

http://ontolog.cim3.net/file/resource/presentation/DavidFerrucci_20060511/UIMA-SemanticWeb–DavidFerrucci_20060511.pdf

Following is the talk that explains the slides:

http://ontolog.cim3.net/file/resource/presentation/DavidFerrucci_20060511/UIMA-SemanticWeb–DavidFerrucci_20060511_Recording-2914992-460237.mp3

And following is his recent talk about the DeepQA project for building and extending that foundation for Jeopardy:

http://www-943.ibm.com/innovation/us/watson/watson-for-a-smarter-planet/building-a-jeopardy-champion/how-watson-works.html

Compared to Ferrucci’s talks, the PBS Nova program was a disappointment. It didn’t get into any technical detail, but it did have a few cameo appearances from AI researchers. Terry Winograd and Pat Winston, for example, said that the problem of language understanding is hard.

But I thought that Marvin Minsky and Doug Lenat said more with their tone of voice than with their words. My interpretation (which could, of course, be wrong) is that both of them were seething with jealousy that IBM built a system that was competing with Jeopardy champions on national TV — and without their help.

In any case, the Watson project shows that terabytes of documents are far more important for commonsense reasoning than the millions of formal axioms in Cyc. That does not mean that the Cyc ontology is useless, but it undermines the original assumptions for the Cyc project: commonsense reasoning requires a huge knowledge base of hand-coded axioms together with a powerful inference engine.

An important observation by Ferrucci: The URIs of the Semantic Web are *not* useful for processing natural languages — not for ordinary documents, not for scientific documents, and especially not for Jeopardy questions:

1. For scientific documents, words like ‘H2O’ are excellent URIs. Adding an http address in front of them is pointless.

2. A word like ‘water’, which is sometimes a synonym for ‘H2O’, has an open-ended number of senses and microsenses.

3. Even if every microsense could be precisely defined and cataloged on the WWW, that wouldn’t help determine which one is appropriate for any particular context.

4. Any attempt to force human being(s) to specify or select a precise sense cannot succeed unless *every* human understands and consistently selects the correct sense at *every* possible occasion.

5. Given that point #4 is impossible to enforce and dangerous to assume, any software that uses URIs will have to verify that the selected sense is appropriate to the context.

6. Therefore, URIs found “in the wild” on the WWW can never be assumed to be correct unless they have been guaranteed to be correct by a trusted source.

These points taken together imply that annotations on documents can’t be trusted unless (a) they have been generated by your own system or (b) they were generated by a system which is at least as trustworthy as your own and which has been verified to be 100% compatible with yours.

In summary, the underlying assumptions for both Cyc and the Semantic Web need to be reconsidered.

You can see the post at: http://ontolog.cim3.net/forum/ontolog-forum/2011-02/msg00114.html

I don’t always agree with Sowa but he has written extensively on conceptual graphs, knowledge representation and ontological matters. See http://www.jfsowa.com/

I missed the local showing but found the video at: Smartest Machine on Earth.

You will find a link to an interview with Minsky at that same location.

I don’t know that I would describe Minsky as “…seething with jealousy….”

While I enjoy Jeopardy and it is certainly more cerebral than say American Idol, I think Minsky is right in seeing the Watson effort as something other than artificial intelligence.

Q: In 2011, who was the only non-sentient contestant on the TV show Jeopardy?

A: What is IBM’s Watson?

February 9, 2011

Oyster: A Configurable ER Engine

Filed under: Entity Resolution,Record Linkage,Semantic Web,Subject Identity — Patrick Durusau @ 4:55 pm

Oyster: A Configurable ER Engine

John Talburt writes a very enticing overview of an entity resolution engine he calls Oyster.

From the post:

OYSTER will be unique among freely available systems in that it supports identity management and identity capture. This allows the user to configure OYSTER to not only run as a typical merge-purge/record linking system, but also as an identity capture and identity resolution system. (Emphasis added)

Yes, record linking we have had since the late 1950’s in a variety of guises and over twenty (20) different names that I know of.

Adding identity management and identity capture (FYI, SW uses universal identifier assignment) will be something truly different.

As in topic map different.

Will be keeping a close watch on this project and suggest that you do the same.

January 27, 2011

Easy Semantic Solution Is At Hand! – Post

Filed under: OWL,RDF,Semantic Web — Patrick Durusau @ 6:15 am

The Federated Enterprise (Using Semantic Technology Standards to Federate Information and to Enable Emergent Analytics)

I had to shorten the title a bit. 😉

Wanted you to be aware of the sort of nonsense that data warehouse people are being told:

The procedure described above enabling federation based on semantic technology is not hard to build; it is just a different way of describing things that people in your enterprise are already describing using incompatible technologies like spreadsheets, text processors, diagramming tools, modeling tools, email, etc. The semantic approach simply requires that everything be described in a single technology, RDF/OWL. This simple change in how things are described enables federation and the paradigm shifting capabilities that accompany it.

Gee, why didn’t we think about that? A single technology to describe everything.

Shakespeare would call this …a tale told by an idiot….

Just thought you could start the day with a bit of amusement.

*****
PS: It’s not the fault of RDF or OWL that people say stupid things about them.

When supercomputers meet the Semantic Web – Post

Filed under: Linked Data,Searching,Semantic Web — Patrick Durusau @ 5:59 am

When supercomputers meet the Semantic Web

Jack Park forwarded the link to this post.

It has descriptions like:

Everything about the hardware is optimised to churn through large quantities of data, very quickly, with vital statistics that soon become silly. A single processor “can sustain 128 simultaneous threads and is connected with up to 8 GB of memory.” The Cray XMT comes with at least 16 of those processors, and can scale to over 8,000 of them in order to handle over 1 million simultaneous threads with 64 TB of shared system memory. Should you want to, you could easily hold the entire Linked Data Cloud in main memory for rapid analysis without the usual performance bottleneck introduced by swapping data on and off disks.

Now, that’s computing!

Do note of the emphasis on graph processing.

I think Semantic Web and topic map fans would do well to pay attention to the big data movement mentioned in this article.

Imagine a topic map whose topics emerge in interaction with subject matter experts querying the data as opposed to being statically authored.

Same for associations between subjects and even their association types.

Still topic maps, just a different way to think about authoring them.

I don’t have a Cray XMT but it should be possible to practice emergent topic map authoring on a smaller device.

I rather like that, emergent topic map authoring, ETMA.

Let me push that around a bit and I will post further notes about it.

January 22, 2011

Making Linked Data work isn’t the problem – Post

Filed under: Linked Data,Semantic Web,Topic Maps — Patrick Durusau @ 7:13 am

Making Linked Data work isn’t the problem

Georgi Kobilarov captures an essential question when he says:

But neither Linked Open Data nor the Semantic Web have really took off from there yet. I know many people will disagree with me and point to the famous Linked Open Data cloud diagram, which shows a large (and growing) number of data sets as part of the Linked Data Web. But where are the showcases of problems being solved?

If you can’t show me problems being solved then something is wrong with the solution. “we need more time” is rarely the real issue, esp. when there is some inherent network effect in the system. Then there should be some magic tipping point, and you’re just not hitting it and need to adjust your product and try again with a modified approach.

My point here is not that I want to propose any particular direction or change, but instead I want to stress what I believe is an issue in the community: too few people are actually trying to understand the problem that Linked Data is supposed to be the solution to. If you don’t understand the problem you can not develop a solution or improve a half-working one. Why? Well, what do you do next? Which part to work on? What to change? There is no ground for those decisions if you don’t have at least a well informed guess (or better some evidence) about the problem to solve. And you can’t evaluate your results.

You could easily substitute topic maps in place of linked data in that quote.

Questions:

Putting global claims to one side, write a 5 – 8 page paper, with citations, answering the following questions:

  1. What specific issue in your library would topic maps help solve? As opposed to what other solutions?
  2. Would topic maps require more or less resources than other solutions?
  3. Would topic maps offer any advantages over other solutions?
  4. How would you measure/estimate the answers in #2 and #3 for a proposal to your library board/director?

(Feel free to suggest and answer other questions I have overlooked.)

January 11, 2011

1st International Workshop on Semantic
Publication (SePublica 2011)

Filed under: Conferences,Ontology,OWL,RDF,Semantic Web,SPARQL — Patrick Durusau @ 7:24 pm

1st International Workshop on Semantic Publication (SePublica 2011) in connection with 8th Extended Semantic Web Conference (ESWC 2011), May 29th or 30th, Hersonissos, Crete, Greece.

From the Call for Papers:

The CHALLENGE of the Semantic Web is to allow the Web to move from a dissemination platform to an interactive platform for networked information. The Semantic Web promises to “fundamentally change our experience of the Web”.

In spite of improvements in the distribution, accessibility and retrieval of information, little has changed in the publishing industry so far. The Web has succeeded as a dissemination platform for scientific and non-scientific papers, news, and communication in general; however, most of that information remains locked up in discrete documents, which are poorly interconnected to one another and to the Web.

The connectivity tissues provided by RDF technology and the Social Web have barely made an impact on scientific communication nor on ebook publishing, neither on the format of publications, nor on repositories and digital libraries. The worst problem is in accessing and reusing the computable data which the literature represents and describes.

No, I am not going to say that topic maps are the magic bullet that will solve all those issues or the ones listed in their Questions and Topics of Interest.

What I do think topic maps bring to the table is an awareness that semantic interoperability isn’t primarily a format or computational problem.

Every new (and impliedly universal) format or model simply compounds the semantic interoperability problem.

By creating yet more formats and/or models between which semantic interoperability has to be designed.

Starting with the question of what subjects need to be identified and how they are identified now could lead to a viable, local semantic interoperability solution.

What more could a client want?

Local semantic interoperability solutions can form the basis for spreading semantic interoperability, one solution at a time.

*****
PS: Forgot the important dates:

Paper/Demo Submission Deadline: February 28, 23:59 Hawaii Time

Acceptance Notification: April 1

Camera Ready Version: April 15

SePublica Workshop: May 29 or May 30 (to be announced)

Dynamic Semantic Publishing for any Blog (Part 1 + 2) – Post(s)

Filed under: Entity Extraction,Semantic Web,Semantics — Patrick Durusau @ 5:10 pm

Dynamic Semantic Publishing for any Blog (Part 1)

Benjamin Nowack outlines how he would replicate the dynamic semantic publishing approach used by the BBC in their coverage of the 2010 World Cup.

Dynamic Semantic Publishing for any Blog (Part 2) will disappoint anyone interested in developing dynamic semantic publishing solutions.

Block level overview that repeats what anyone interested in semantic technologies already knows.

Extended infomercial.

Save your time and look elsewhere for substantive content on semantic publishing.

December 31, 2010

Libraries and the Semantic Web (video)

Filed under: Library,Semantic Web — Patrick Durusau @ 6:15 am

Libraries and the Semantic Web (video)

This is a very amusing video.

December 19, 2010

OWL, Ontologies, Formats, Punch Cards, Oh My!

Filed under: Linked Data,OWL,Semantic Web,Topic Maps — Patrick Durusau @ 2:03 pm

Edwin Black’s IBM and the Holocaust reports that one aspect of the use of IBM punch card technology by the Nazis (and others) was the monopoly that IBM maintained on the manufacture of the punch cards.

The IBM machines could only use IBM punch cards.

The IBM machines could only use IBM punch cards.

The repetition was intentional. Think about that statement in a more modern context.

When we talk about Linked Data, or OWL, or Cyc, or SUMO, etc. (yes, I am aware that I am mixing formats and ontologies), isn’t that the same thing?

They are not physical monopolies like IBM punch cards but rather are intellectual monopolies.

Say it this way (insert your favorite format/ontology) or you don’t get to play.

I am sure that meets the needs of software designed to work on with particular formats or ontologies.

But that isn’t the same thing as representing user semantics.

Note: Representing user semantics.

Not semantics as seen by the W3C or SUMO or Cyc or (insert your favorite group) or even XTM Topic Maps.

All of those quite usefully represent some user semantics.

None of them represent all user semantics.

No, I am not going to argue there is a non-monopoly solution.

To successfully integrate (or even represent) data, choices have to be made and those will result in a monopoly.

My caution it is to not mistake the lip of the teacup that is your monopoly for the horizon of the world.

Very different things.

*****
PS: Economic analysis of monopolies could be useful when discussing intellectual monopolies. The “products” are freely available but the practices have other characteristics of monopolies. (I have added a couple of antitrust books to my Amazon.com wish list should anyone feel moved to contribute.)

December 18, 2010

December 17, 2010

CFP – Dealing with the Messiness of the Web of Data – Journal of Web Semantics

CFP – Dealing with the Messiness of the Web of Data – Journal of Web Semantics

From the call:

Research on the Semantic Web, which is now in its second decade, has had a tremendous success in encouraging people to publish data on the Web in structured, linked, and standardized ways. The success of what has now become the Web of Data can be read from the sheer number of triples available within the Linked-Open Data, Linked Life Data and Open-Government initiatives. However, this growth in data makes many of the established assumptions inappropriate and offers a number of new research challenges.

In stark contrast to early Semantic Web applications that dealt with small, hand-crafted ontologies and data-sets, the new Web of Data comes with a plethora of contradicting world-views and contains incomplete, inconsistent, incorrect, fast-changing and opinionated information. This information not only comes from academic sources and trustworthy institutions, but is often community built, scraped or translated.

In short: the Web of Data is messy, and methods to deal with this messiness are paramount for its future.

Now, we have two choices as the topic map community:

  • congratulate ourselves for seeing this problem long ago, high five each other, etc., or
  • step up and offer topic map solutions that incorporate as much of the existing SW work as possible.

I strongly suggest the second one.

Important dates:

We will aim at an efficient publication cycle in order to guarantee prompt availability of the published results. We will review papers on a rolling basis as they are submitted and explicitly encourage submissions well before the submission deadline. Submit papers online at the journal’s Elsevier Web site.

Submission deadline: 1 February 2011
Author notification: 15 June 2011

Revisions submitted: 1 August 2011
Final decisions: 15 September 2011
Publication: 1 January 2012

December 8, 2010

Semantic Web – Journal Issue 1/1-2

Filed under: OWL,RDF,Semantic Web — Patrick Durusau @ 8:18 pm

Semantic Web

The first issue of Semantic Web is openly viewable and now online.

In their introductory remarks the editors focus in part on the journal’s subtitle:

The journal’s subtitle – Interoperability, Usability, Applicability – re?ects the wide scope of the journal, by putting an emphasis on enabling new technologies and methods. Interoperability refers to aspects such as the seamless integration of data from heterogeneous sources, on-the-?y composition and interoperation of Web services, and next-generation search engines. Usability encompasses new information retrieval paradigms, user interfaces and interaction, and visualization techniques, which in turn require methods for dealing with context dependency, personalization, trust, and provenance, amongst others, while hiding the underlying computational issues from the user. Applicability refers to the rapidly growing application areas of Semantic Web technologies and methods, to the issue of bringing state-of-the-art research results to bear on real-world applications, and to the development of new methods and foundations driven by real application needs from various domains.

Skimming the table of contents I can see lots of opportunity for comments and rejoinders.

For the present I simply commend this new journal and its contents to you for your reading pleasure.

December 7, 2010

Open Provenance Model *
Ontology – RDF – Semantic Web

Filed under: Ontology,RDF,Semantic Web — Patrick Durusau @ 11:37 am

A spate of provenance ontology materials landed in my inbox today:

  1. Open Provenance Model Ontology (OPMO)
  2. Open Provenance Model Vocabulary (OPMV)
  3. Open Provenance Model (OPM)
  4. Provenance Vocabulary Mappings

We should could ourselves fortunate that the W3C working group did not title their document: Open Provenance Model Vocabulary Mappings.

The community would be better served with less clever and more descriptive naming.

No doubt the Open Provenance Model Vocabulary (#2 above) has some range of materials in mind.

I don’t know the presumed target but some candidates come to mind:

  • Art Museum Open Provenance Model (including looting/acquisition terms)
  • Library Open Provenance Model
  • Natural History Open Provenance Model
  • ….

I am, of course, giving the author’s the benefit of the doubt in presuming their intent was not to create a universal model of provenance.

For topic map purposes, the Provenance Vocabulary Mappings document (#4 above) is the most interesting. Read through it and then answer the questions below.

Questions:

  1. Assume you have yet another provenance vocabulary. On what basis would you map it to any of the other vocabularies discussed in #4?
  2. Most of the mappings in #4 give a rationale. How is that (if it is) different from properties and merging rules for topic maps?
  3. What should we do with mappings in #4 or elsewhere that don’t give a rationale?
  4. How should we represent rationales for mappings? Is there some alternative not considered by topic maps?

Summarize your thoughts in 3-5 pages for all four questions. They are too interrelated to answer separately. You can use citations if you like but these aren’t questions answered in the literature. Or, well, at least I don’t find any of the answers in the literature convincing. 😉 Your experience may vary.

December 5, 2010

idk (I Don’t Know) – Ontology, Semantic Web – Cablegate

Filed under: Associations,Ontology,Roles,Semantic Web,Subject Identity,Topic Maps — Patrick Durusau @ 4:45 pm

While researching the idk (I Don’t Know) post I ran across the suggestion unknown was not appropriate for an ontology:

Good principles of ontological design state that terms should represent biological entities that actually exist, e.g., functional activities that are catalyzed by enzymes, biological processes that are carried out in cells, specific locations or complexes in cells, etc. To adhere to these principles the Gene Ontology Consortium has removed the terms, biological process unknown ; GO:0000004, molecular function unknown ; GO:0005554 and cellular component unknown ; GO:0008372 from the ontology.

The “unknown” terms violated this principle of sound ontological design because they did not represent actual biological entities but instead represented annotation status. Annotations to “unknown” terms distinguished between genes that were curated when no information was available and genes that were not yet curated (i.e., not annotated). Annotation status is now indicated by annotating to the root nodes, i.e. biological_process ; GO:0008150, molecular_function ; GO:0003674, or cellular_component ; GO:0005575. These annotations continue to signify that a given gene product is expected to have a molecular function, biological process, or cellular component, but that no information was available as of the date of annotation.

Adhering to principles of correct ontology design should allow GO users to take advantage of existing tools and reasoning methods developed by the ontological community. (http://www.geneontology.org/newsletter/archive/200705.shtml, 5 December 2010)

I wonder about the restriction, “…entities that actually exist.” means?

If a leak of documents occurs, a leaker exists, but in a topic map, I would say that was a role, not an individual.

If the unknown person is represented as an annotation to a role, how do I annotate such an annotation with information about the unknown/unidentified leaker?

Being unknown, I don’t think we can get that with an ontology, at least not directly.

Suggestions?

PS: A topic map can represent unknown functions, etc., as first class subjects (using topics) for an appropriate use case.

December 3, 2010

Declared Instance Inferences (DI2)? (RDF, OWL, Semantic Web)

Filed under: Inference,OWL,RDF,Semantic Web,Subject Identity — Patrick Durusau @ 8:49 am

In recent discussions of identity, I have seen statements that OWL reasoners could infer that two or more representatives stood for the same subject.

That’s useful but I wondered if the inferencing overhead is necessary in all in such cases?

If a user recognizes that a subject representative (a subject proxy in topic map terms) represents the same subject as another representative, a declarative statement avoids the need for artificial inferencing.

I am sure there are cases where inferencing is useful, particularly to suggest inferences to users, but declared inferences could reduce that need and the overhead.

Declarative information artifacts could be created that contain rules for known identifications.

For example, gene names found in PubMed. If two or more names are declared to refer to the same gene, where is the need for inferencing?

With such declarations in place, no reasoner has to “infer” anything about those names.

Declared instance inferences (DI2) reduce semantic dissonance, inferencing overhead and uncertainty.

Looks like a win-win situation to me.

*****
PS: It occurs to me that ontologies are also “declared instance inferences” upon which artificial reasoners rely. The instances happen to be classes and not individuals.

November 28, 2010

Names, Identifiers, LOD, and the Semantic Web

Filed under: LOD,Names,RDF,Semantic Web,Subject Identifiers — Patrick Durusau @ 5:28 pm

I have been watching the identifier debate in the LOD community with its revisionists, personal accounts and other takes on what the problem is, if there is a problem and how to solve the problem if there is one.

I have a slightly different question: What happens when we have a name/identifier?

Short of being present when someone points to or touches an object, themselves, you (if the TSA) and says a name or identifier, what happens?

Try this experiment. Take a sheet of paper and write: George W. Bush.

Now write 10 facts about George W. Bush.

Please circle which ones that you think must match to identify George W. Bush.

So, even though you knew the name George W. Bush, isn’t it fair to say that the circled facts are what you would use to identify George W. Bush?

Here’s is the fun part: Get a colleague or co-worker to do the same experiment. (Substitute Lady Gaga if your friends don’t know enough facts about George W. Bush.)

Now compare several sets of answers for the same person.

Working from the same name, you most likely listed different facts and different ones you would use to identify that subject.

Even though most of you would agree that some or all of the facts listed go with that person.

It sounds like even though we use identifiers/names, those just clue us in on facts, some of which we use to make the identification.

That’s the problem isn’t it?

A name or identifier can make us think of different facts (possibly identifying different subjects) and even if the same subject, we may use different facts to identify the subject.

Assuming we are at a set of facts (RDF graph, whatever) we need to know: What facts identify the subject?

And a subject may have different identifying properties, depending on the context of identification.

Questions:

  1. How to specify essential facts for identification as opposed to the extra ones?
  2. How to answer #1 for an RDF graph?
  3. How do you make others aware of your answer in #2?

Comments/suggestions?

Ontologies, Semantic Data Integration, Mono-ontological (or not?)

Filed under: Marketing,Medical Informatics,Ontology,Semantic Web,Topic Maps — Patrick Durusau @ 10:21 am

Ontologies and Semantic Data Integration

Somewhat dated, 2005, but still interesting.

I was particularly taken with:

First, semantics are used to ensure that two concepts, which might appear in different databases in different forms with different names, can be described as truly equivalent (i.e. they describe the same object). This can be obscured in large databases when two records that might have the same name actually describe two different concepts in two different contexts (e.g. ‘COLD’ could mean ‘lack of heat’, ‘chronic obstructive lung disorder’ or the common cold). More frequently in biology, a concept has many different names during the course of its existence, of which some might be synonymous (e.g. ‘hypertension’ and ‘high blood pressure’) and others might be only closely related (e.g. ‘Viagra’, ‘UK92480’ and ‘sildenafil citrate’).

In my view you could substitute “topic map” everywhere he says ontology, well, except one.

With a topic map, you and I can have the same binding points for information about particular subjects and yet not share the same ontological structure.

Let me repeat that: With a topic map we can share (and update) information about subjects, even though we don’t share a common ontology.

You may have a topic map that reflects a political history of the United States over the last 20 years and in part it exhibits an ontology that reflects elected offices and their office holders.

For the same topic map, to which I contribute information concerning those office holders, I might have a very different ontology, involving offices in Hague.

The important fact is that we could both contribute information about the same subjects and benefit from the information entered by others.

To put it another way is the different being mono-ontological or not?

Questions:

  1. Is “mono-ontological” another way of saying “ontologically/logically” consistent? (3-5 pages, citations if you like)
  2. What are the advantages of mono-ontological systems? (3-5 pages, citations)
  3. What are the disadvantages of mono-ontological systems? (3-5 pages, citations)

November 26, 2010

Scalable reduction of large datasets to interesting subsets

Filed under: OWL,RDF,Semantic Web — Patrick Durusau @ 11:04 am

Scalable reduction of large datasets to interesting subsets Authors: Gregory Todd Williams, Jesse Weaver, Medha Atre, James A. Hendler Keywords: Billion Triples Challenge, Scalability, Parallel, Inferencing, Query, Triplestore

Abstract:

With a huge amount of RDF data available on the web, the ability to find and access relevant information is crucial. Traditional approaches to storing, querying, and reasoning fall short when faced with web-scale data. We present a system that combines the computational power of large clusters for enabling large-scale reasoning and data access with an efficient data structure for storing and querying the accessed data on a traditional personal computer or other resource-constrained device. We present results of using this system to load the 2009 Billion Triples Challenge dataset, materialize RDFS inferences, extract an “interesting” subset of the data using a large cluster, and further analyze the extracted data using a personal computer, all in the order of tens of minutes.

I wonder about the use of the phrase “…web-scale data?”

if a billion triples is a real challenge, then what happens when RDF/RDFa is deployed across an entity and inference rich body of material like legal texts? Or property descriptions? Or the ownership rights based on property descriptions?

In any event, the prep of the data for inferencing illustrates a use case for topic maps:

Information about people is represented in different ways in the BTC2009 dataset, including the use of the FOAF,7 SIOC,8 DBpedia,9 and AKT10 ontologies. We create a simple upper ontology to bring together concepts and properties pertaining to people. For example, we define the class up:Person which is defined as a superclass to existing person classes, e.g., foaf:Person. We do the same for relevant properties, e.g., up:full name is a superproperty of akt:full-name. Note that “up” is the namespace prefix for our upper ontology.

What subject represented by akt:full-name was responsible for the mapping in question? How does that translate to other ontologies? Oh, sorry, no place to record that mapping.

Questions:

  1. How do you evaluate the claims of “…web-scale data?” (3-5 pages, citations)
  2. Does creating ad-hoc upper ontologies scale? Yes/No/Why? (3-5 pages, citations)
  3. How does interchanges of ad-hoc uppper ontologies work? (3-5 pages, citations)

Managing Terabytes of Web Semantics Data

Filed under: OWL,RDF,Semantic Web — Patrick Durusau @ 11:00 am

Managing Terabytes of Web Semantics Data Authors: Michele Catasta, Renaud Delbru, Nickolai Toupikov, and Giovanni Tummarello

Abstract:

A large amount of semi structured data is now made available on the Web in form of RDF, RDFa and Microformats. In this chapter, we discuss a general model for the Web of Data and, based on our experience in Sindice.com, we discuss how this is reflected in the architecture and components of a large scale infrastructure. Aspects such as data collection, processing, indexing, ranking are touched, and we give an ample example of an applications built on top of said infrastructure.

Appears as Chapter 6 in R. De Virgilio et al. (eds.), Semantic Web Information Management, Š Springer-Verlag Berlin Heidelberg 2010.

Hopefully not too repetitious with the other Sindice.com material I have been posting.

It is a good overview of the area, in addition to specifics about Sindice.com.

Semantic Now?

Filed under: Navigation,OWL,RDF,Semantic Web,Topic Maps — Patrick Durusau @ 10:58 am

Visit Semantic Web, then return here (or use a separate browser window).

I went to the Semantic Web page of the W3C looking for a prior presentation and was struck by the semantic now nature of the page.

It isn’t clear how to access older material.

I have to confess to having only a passing interest in self-promotional, puff pieces, including logos.

I assume that is true for many of the competent researchers working with the W3C. (There are a lot of them, this is not a criticism of their work.)

So, where is the interface that enables quick access to substantial materials, including older standards, statements and presentations?

*****
I understand at least some of the W3C site is described in RDF. What degree of detail, precision, I don’t know. Would make a starting point for a topic map of the site.

The other necessary component and where this page falls down, would be a useful navigation choices. That would be the harder problem.

Let me know if you are interested in cracking this nut.

Another Take on the Semantic Web?

Filed under: OWL,RDF,Semantic Web — Patrick Durusau @ 10:56 am

Bob Ferris constructs a take on the SW at: On Resources, Information Resources and Documents.

Whatever you think of Bob’s vision of the SW, the fundamental problem is one of requiring universal use of a flat identifier (URI).

Which leaves us with string comparison. Different string, different thing being identified.

Some of the better SW software now evaluates RDF graphs for identification of entities.

Not all that different from how we identify entities.

Departs from the URI = Identifier basis of the SW, but to be useful, that was inevitable.

Two more challenges face the SW (where topic maps can help, there are others):

1) How to communicate to other users what parts of an RDF graph to match for identity purposes? (including matching on subparts)

2) How to communicate to other users when non-Isomorphic RDF graphs are semantically equivalent?

More on those issues anon.

November 25, 2010

Sig.ma – Live views on the Web of Data

Filed under: Indexing,Information Retrieval,Lucene,Mapping,RDF,Search Engines,Semantic Web — Patrick Durusau @ 10:27 am

Sig.ma – Live views on the Web of Data

From the website:

In Sig.ma, elements such as large scale semantic web indexing, logic reasoning, data aggregation heuristics, pragmatic ontology alignments and, last but not least, user interaction and refinement, all play together to provide entity descriptions which become live, embeddable data mash ups.

Read one of various versions of an article on Sig.ma for the technical details.

From the Web Technologies article cited on the homepage:

Sig.ma revolves around the creation of Entity Profiles. An entity profile – which in the Sig.ma dataflow is represented by the “data cache” storage (Fig. 3) – is a summary of an entity that is presented to the user in a visual interface, or which can be returned by the API as a rich JSON object or a RDF document. Entity profiles usually include information that is aggregated from more than one source. The basic structure of an entity profile is a set of key-value pairs that describe the entity. Entity profiles often refer to other entities, for example the profile of a person might refer to their publications.

No, this isn’t an implementation of the TMRM.

This is an implementation of one way to view entities for a particular type of data. A very exciting one but still limited to a particular data set.

This is a big step forward.

For example, it isn’t hard to imagine entity profiles against particular websites or data sets. Entity profiles that are maintained and leased for use with search engines like Sig.ma.

Or going a bit further and declaring a basis for identification of subjects, such as the existence of properties a…n in an RDF graph.

Questions:

  1. Spend a couple of hours with Sig.ma researching library related questions. (Discussion)
  2. What did you like, dislike or find surprising about Sig.ma? (3-5 pages, no citations)
  3. Entity profiles for library science (Class project)

Sig.ma: Live Views on the web of data – bibliography issues

I normally start with a DOI here so you can see article in question.

Not here.

Here’s why:

Sig.ma: Live views on the Web of Data Journal of Web Semantics. (10 pages)

Sig.ma: Live Views on the Web of Data WWW ’10 Proceedings(demo, 4 pages)

Sig.ma: Live Views on the Web of Data (8 pages) http://richard.cyganiak.de/2008/papers/sigma-semwebchallenge2009.pdf

Sig.ma: Live Views on the Web of Data (4 pages) http://richard.cyganiak.de/2008/papers/sigma-demo-www2010.pdf

Sig.ma: Live Views on the Web of Data (25 pages) http://fooshed.net/paper/JWS2010.pdf

Before saying anything ugly, ;-), this is some of the most exciting research I have seen in a long time. I will cover that part of it in a following post. But, to the matter at hand, bibliographic control.

Five (5) different articles, two published in recognized journals that all have the same name? (The demo articles are the same but have different headers/footers, page numbers and so would likely be indexed as different articles.)

I will be able to resolve any confusion by obtaining the article in question.

But that isn’t an excuse.

I, along with everyone else interested in this research, will waste a small part of our time resolving the confusion. Confusion that could have been avoided for everyone.

Not unlike everyone who does the same search having to tread the same google glut.

With no way to pass on what we have resolved, for the benefit of others.

Questions:

  1. Help these authors out. How would you suggest they avoid this in the future? Use of the name is important. (3-5 pages, no citations)
  2. Help the library out. How will you deal with multiple papers with the same title, authors, pub year? (this isn’t uncommon) (3-5 pages, citations optional)
  3. How would you use topic maps to resolve this issue? (3-5 pages, no citations)

Virtuoso Open-Source Edition

Filed under: Linked Data,RDF,Semantic Web,Software — Patrick Durusau @ 7:06 am

Virtuoso Open-Source Edition

I ran across Virtuoso while running down the references in the article on SIREn. (Yes, I check references, not all of them, just the most interesting ones, as time permits.)

Has partial support for a variety of “Semantic Web” technologies.

Is the basis for OpenLink Data Spaces.

A named structured data cluster within a distributed data network where each item of data (each “datum”) has a unique identifier. Fundamental characteristics of data spaces include:

  • Each Data Item (or Entity) is endowed with a unique HTTP-based Identifier
  • Entity Identity, Access, and Representation are each distinct from the others
  • Entities are interlinked via attributes and relationship properties
  • Creation, Update, and Deletion privileges are controlled by the space owner

I can think of lots of “data spaces,” Large Hadron Collider data, radio and optical astronomy data dumps, TCP/IP data streams, bioinformatics data, commercial transaction databases that don’t fit this description. Please submit your own.

Still, if you want to learn the ins and outs as well as the limitations of this approach, it costs nothing more than the time to download the software.

November 23, 2010

Querying the British National Bibliography

Filed under: British National Bibliography,Dataset,RDF,Semantic Web,SPARQL — Patrick Durusau @ 9:40 am

Querying the British National Bibliography

From the webpage:

Following up on the earlier announcement that the British Library has made the British National Bibliography available under a public domain dedication, the JISC Open Bibliography project has worked to make this data more useable.

The data has been loaded into a Virtuoso store that is queriable through the SPARQL Endpoint and the URIs that we have assigned each record use the ORDF software to make them dereferencable, supporting perform content auto-negotiation as well as embedding RDFa in the HTML representation.

The data contains some 3 million individual records and some 173 million triples. …

The data is also available for local processing but it isn’t much of a “web” if the first step is to always download a local copy of the data.

It should be interesting to watch for projects that combine the results of queries against this data with the results of other queries against other data sets. Particularly if those other data sets follow different metadata regimes.

Isn’t that the indexing problem all over again?

Questions:

  1. What data set would you want to combine with British National Bibliography (BNB)?
  2. What issues do you see arising from combing the BNB with your data set? (3-5 pages, no citations)
  3. Combining the BNB with another data set. (project)

November 21, 2010

Ontology Based Graphical Query Language Supporting Recursion

Filed under: Ontology,Query Language,Semantic Web,Visual Query Language — Patrick Durusau @ 7:55 am

Ontology Based Graphical Query Language Supporting Recursion Author(s): Arun Anand Sadanandan, Kow Weng Onn and Dickson Lukose Keywords: Visual Query Languages, Visual Query Systems, Visual Semantic Query, Graphical Recursion, Semantic Web, Ontologies

Abstract:

Text based queries often lead tend to be complex, and may result in non user friendly query structures. However, querying information systems using visual means, even for complex queries has proven to be more efficient and effective as compared to text based queries. This is owing to the fact that visual systems make way for better human-computer communication. This paper introduces an improved query system using a Visual Query Language. The system allows the users to construct query graphs by interacting with the ontology in a user friendly manner. The main purpose of the system is to enable efficient querying on ontologies even by novice users who do not have an in-depth knowledge of internal query structures. The system also supports graphical recursive queries and methods to interpret recursive programs from these visual query graphs. Additionally, we have performed some preliminary usability experiments to test the efficiency and effectiveness of the system.

From the abstract I was expecting visual representation of the subjects that form the query. The interface remains abstract but is a good step in the direction of a more useful query interface for the non-expert. (Which we all are in some domain.)

Questions:

  1. Compare to your experience with query language interfaces. (3-5 pages, no citations)
  2. Are recursive queries important for library catalogs? (3-5 pages, no citations, but use examples to make your case, pro or con)
  3. Suggestions for a visual query language for the current TMQL draft? (research project)

November 19, 2010

“…an absolute game changer”

Filed under: Linked Data,LOD,Marketing,Semantic Web — Patrick Durusau @ 1:27 pm

Aldo Bucchi write that http://uriburner.com/c/DI463N is:

Single most powerful demo available. Really looking fwd to what’s coming next.

Let’s see how this shifts gears in terms of Linked Data comprehension.
Even in its current state, this is an absolute game changer.

I know this was not easy. My hat goes off to the team for their focus.

Now, just let me send this link out to some non-believers that have
been holding back my evangelization pipeline 😉

I may count as one of the “non-believers.” 😉

Before Aldo throws open the flood gates on his “evagenlization pipeline,” let me observe:

The elderly gentlemen appears in: Tropical grassland, Desert, Temperate grassland, Coniferous forest, Flooded grassland, Mountain grassland, Broadleaf forest, Tropical dry forest, Rainforest, Taiga, Tundra, Urban, Tropical coniferous forests, Mountains, Coastal, and Wetlands.

So he must get around a lot.

Only the BBC appears in Estuaries.

Granting that is a clever presentation of subjects that share a common locale and works fairly responsively but that hardly qualifies as a “…game changer…”

This project is a good experiment on making information more accessible.

Why aren’t the facts enough?

« Newer PostsOlder Posts »

Powered by WordPress