Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

July 15, 2011

RDFaCE WYSIWYM RDFa Content Editor

Filed under: RDFa,Semantic Web — Patrick Durusau @ 6:47 pm

RDFaCE WYSIWYM RDFa Content Editor

From the announcement:

RDFaCE is an online RDFa content editor based on TinyMCE. In addition to two classical views for text authoring (WYSIWYG and HTML source code), RDFaCE supports two novel views for semantic content authoring namely WYSIWYM (What You See Is What You Mean), which highlights semantic annotations directly inline and a triple view (aka. fact view). Further features are:

  • use of different Web APIs (Prefix.cc, Sindice, Swoogle) to facilitate the semantic content authoring process.
  • combining of results from multiple NLP APIs (Alchemy, Extractive, Ontos, Evri, OpenCalais) for obtaining rich automatic semantic annotations that can be modified and extended later on.

This is very clever and a step forward for the Semantic Web.

July 12, 2011

DataSift

Filed under: Marketing,Semantic Web,Semantics — Patrick Durusau @ 7:11 pm

DataSift

Congratulations to DataSift for the $6 Million in funding!

But there is another gem in the story about their funding.

Instead Mr. Halstead looked at how companies like Amazon had disrupted server rental and came up with a plan to do the same to data analysis. “For me the technology isn’t the game changer. For me it is approaching data processing in a democratized way. There are any number of companies that will sell you data, but they will typically charge you a five-figure sum minimum.

Note the line: “For me the technology isn’t the game changer. For me it is approaching data processing in a democratized way.”

That the trick isn’t it? To approach “…semantics in a democratized way.”

Precisely what topic maps have to offer. Topic maps can capture your semantics. If and when you become interested in the semantics of others, you can map your semantics to theirs, preserving the integrity of both.

Disruptive to the top down ontology approach you ask? I suppose that is true, it is. But then democratization is always disruptive of authoritarian schemes and patterns.

The real question is: Whose semantics would you rather have? Your own or those of someone else?

July 9, 2011

Big Data and the Semantic Web

Filed under: BigData,Semantic Web — Patrick Durusau @ 7:01 pm

Big Data and the Semantic Web

Edd Dumbill sees Big Data as answering the semantic questions posed by the Semantic Web:

Conventionally, semantic web systems generate metadata and identified entities explicitly, ie. by hand or as the output of database values. But as anybody who’s tried to get users to do it will tell you, generating metadata is hard. This is part of why the full semantic web dream isn’t yet realized. Analytical approaches take a different approach: surfacing and classifying the metadata from analysis of the actual content and data itself. (Freely exposing metadata is also controversial and risky, as open data advocates will attest.)

Once big data techniques have been successfully applied, you have identified entities and the connections between them. If you want to join that information up to the rest of the web, or to concepts outside of your system, you need a language in which to do that. You need to organize, exchange and reason about those entities. It’s this framework that has been steadily built up over the last 15 years with the semantic web project.

To give an already widespread example: many data scientists use Wikipedia to help with entity resolution and disambiguation, using Wikipedia URLs to identify entities. This is a classic use of the most fundamental of semantic web technologies: the URI.

I am not sure where Edd gets: “Once big data techniques have been successfully applied, you have identified entities and the connections between them.” Really? Or is that something hoped for in the future? A general solution to entity extraction and discovery of relationships remains a research topic.

Big Data will worsen the semantic poverty of the Semantic Web and drive the search for tools and approaches to address that poverty.

July 6, 2011

STI Innsbruck

Filed under: OWL,RDF,RDFa,Semantic Web — Patrick Durusau @ 2:12 pm

STI Innsbruck

From the about page:

The Semantic Technology Institute (STI) Innsbruck, formerly known as DERI Innsbruck, was founded by Univ.-Prof. Dr. Dieter Fensel in 2002 and has developed into a challenging and dynamic research institute of approximately 40 people. STI Innsbruck collaborates with an international network of institutes in Asia, Europe and the USA, as well as with a number of global industrial partners.

STI Innsbruck is a founding member of STI International, a collaborative association of leading European and world wide initiatives, ensuring the success and sustainability of semantic technology development. STI Innsbruck utilizes this network, as well as contributing to it, in order to increase the impact of the research conducted within the institute. For more details on Semantics, check this interview with Frank Van Harmelen: “Search and you will find“.

I won’t try to summarize the wealth of resources you will find at STI Innsbruck. From the reading list for the curriculum to the listing of tools and publications, you will certainly find material of interest at this site.

For an optimistic view of Semantic Web activity see the interview with Frank Van Harelen.

Joint International Semantic Technology
Conference (JIST2011)

Filed under: Conferences,OWL,RDF,Semantic Web — Patrick Durusau @ 2:10 pm

Joint International Semantic Technology Conference (JIST2011) Dec. 4-7, 2011, Hangzhou, China

Important Dates:


– Submissions due: August 15, 2011, 23:59 (11:59pm) Hawaii time

– Notification: September 22, 2011, 23:59 (11:59pm) Hawaii time

– Camera ready: October 3, 2011, 23:59 (11:59pm) Hawaii time

– Conference dates: December 4-7, 2011

From the call:

The Joint International Semantic Technology Conference (JIST) is a regional federation of Semantic Web related conferences. The mission of JIST is to bring together researchers in disciplines related to the semantic technology from across the Asia-Pacific Region. JIST 2011 incorporates the Asian Semantic Web Conference 2011 (ASWC 2011) and Chinese Semantic Web Conference 2011 (CSWC 2011).

Prof. Ian Horrocks (Oxford University) scheduled to present a keynote address.

July 5, 2011

A Survey On Games For Knowledge Acquisition

Filed under: Authoring Semantics,Games,Semantic Web,Semantics — Patrick Durusau @ 1:41 pm

A Survey On Games For Knowledge Acquisition by Stefan Thaler, Katharina Siorpaes, Elena Simperl, and, Christian Hofer.

Abstract:

Many people dedicate their free time with playing games or following game related activities. The Casual Games Market Report 2007[3] names games with more than 300 million downloads. Moreover, the Casual Games Association reports more than 200 million casual gamers worldwide [4]. People play them for various reasons, such as to relax, to be entertained, for the need of competition and to be thrilled[9]. Additionally they want to be challenged, mentally as well skill based. As earlier mentioned there are tasks that are relatively easy to complete by humans but computationally rather infeasible to solve[27]. The idea to integrate such tasks as goal of games has been created and realized in platforms such as OntoGame[21],GWAP[26] and others. Consequently, they have produced a win-win situation where people had fun playing games while actually doing something useful, namely producing output data which can be used to improve the experience when dealing with data. That is why we in this describe state of the art games. Firstly, we briefly introduce games for knowledge acquisition. Then we outline various games for semantic content creation we found, grouped by the task they attempt to fulfill. We then provide an overview over these games based on various criteria in tabular form.

Interesting survey of the field that will hopefully be updated every year or even made into an online resource that can change as new games emerge.

Curious about two possibilities for semantic games:

1) Has anyone made a first-person shooter game based on recognition of facial images of politicians? Thinking that if you were given a set of “bad” guys to recognize for each level, you could shot those plus the usual combatants. The images in the game would be draw from news footage, etc. Thinking this might attract political devotees. I even have a good name for it: “Term Limits.”

2) On the theory that there is no one nosier than a neighbor, why not create an email tagging game where anonymous co-workers get to tag your email (both in and out)? That would be one way to add semantic value to corporate email and generate a lot of interest in doing so. Possible name: “Heard at Water Cooler.”

July 4, 2011

Digital Diplomatics 2011

Filed under: Conferences,Semantic Web,Semantics — Patrick Durusau @ 6:06 pm

Digital Diplomatics 2011: Tools for the Digital Diplomatist (program)

From the Call for Papers:

Studying medieval documents the scholars never had a fundamental opposition on using modern technology to support their research. Nevertheless no technology since the introduction of photography had such an impact on questions and methods of diplomatics as the computer had: Digital imaging gives us cheap reproductions at high quality, so nowadays large copora of documents are to be found online. Digital imaging allows manipulations to make apparently invisible traces visible. Modern information technology gives us access to huge text corpora in which single words and phrases can be found thus helping to indicate relationsships, to retrieve parallel texts for comparision or plot geographical and temporal distributions.

The conference aims at presenting projects which working to enlarge the digitised charter corpus on the one hand and on the other hand will put a particular focus on research applying information technology on medieval and early modern charters aiming at pure diplomatic questions as well as historic or philologic research.

The organizer of the conference therefore invite proposals dealing with questions like:

  • How can we improve the access to digital charter corpora?
  • How can the presentation of digital charter corpora help research with them?
  • Are there experiences in the application of complex information technologies (like named entity recognition, ontologies, data-mining, text-mining, automatic authorship identification, pattern analysis, optical character recognition, advanced statistics etc.) for diplomatic research?
  • Have digital charter copora developed new research interests?
  • Are there old research questions to be tackled by the digital technologies and digital charter corpora?
  • Which well establish methods can’t be accelerated by digital technologies?
  • How far the internet the has changed scholarly communication in diplomatics?
  • How you shape digitization projects of charters to meet research needs?

The papers on this program address some of the areas that made me interested in topic maps.

Commercial semantic issues pale beside those of academic textual analysis and research.

June 29, 2011

Providing and discovering definitions of URIs

Filed under: Identifiers,Linked Data,LOD,OWL,RDF,Semantic Web — Patrick Durusau @ 9:10 am

Providing and discovering definitions of URIs by Jonathan A. Rees.

Abstract:

The specification governing Uniform Resource Identifiers (URIs) [rfc3986] allows URIs to mean anything at all, and this unbounded flexibility is exploited in a variety contexts, notably the Semantic Web and Linked Data. To use a URI to mean something, an agent (a) selects a URI, (b) provides a definition of the URI in a manner that permits discovery by agents who encounter the URI, and (c) uses the URI. Subsequently other agents may not only understand the URI (by discovering and consulting the definition) but may also use the URI themselves.

A few widely known methods are in use to help agents provide and discover URI definitions, including RDF fragment identifier resolution and the HTTP 303 redirect. Difficulties in using these methods have led to a search for new methods that are easier to deploy, and perform better, than the established ones. However, some of the proposed methods introduce new problems, such as incompatible changes to the way metadata is written. This report brings together in one place information on current and proposed practices, with analysis of benefits and shortcomings of each.

The purpose of this report is not to make recommendations but rather to initiate a discussion that might lead to consensus on the use of current and/or new methods.

The criteria for success:

  1. Simple. Having too many options or too many things to remember makes discovery fragile and impedes uptake.
  2. Easy to deploy on Web hosting services. Uptake of linked data depends on the technology being accessible to as many Web publishers as possible, so should not require control over Web server behavior that is not provided by typical hosting services.
  3. Easy to deploy using existing Web client stacks. Discovery should employ a widely deployed network protocol in order to avoid the need to deploy new protocol stacks.
  4. Efficient. Accessing a definition should require at most one network round trip, and definitions should be cacheable.
  5. Browser-friendly. It should be possible to configure a URI that has a discoverable definition so that ‘browsing’ to it yields information useful to a human.
  6. Compatible with Web architecture. A URI should have a single agreed meaning globally, whether it’s used as a protocol element, hyperlink, or name.

.

I had to look it up to get the page number but I remembered Karl Wiegers in Software Requirements saying:

Feasible

It must be possible to implement each requirement within the known capabilities and limitations of the system and its environment.

The single agreed meaning globally, whether it’s used as a protocol element, hyperlink, or name requirement is not feasible. It will stymie this project, despite the array of talent on hand, until it is no longer a requirement.

Need proof? Name one URI with a single agreed meaning globally, whether it’s used as a protocol element, hyperlink, or name.

Not one that the W3C TAG, or TBL or anyone else thinks/wants/prays has a single agree meaning globally, … but one that in fact has such a global meaning.

It’s been more than ten years. Let’s drop the last requirement and let the rather talented group working on this come up with a solution that meets the other five (5) requirements.

It won’t be a universal solution but then neither is the WWW.

LarKC: The Large Knowledge Collider

Filed under: OWL,Semantic Web,Semantics — Patrick Durusau @ 9:04 am

LarKC: The Large Knowledge Collider

A tweet about a video on LarKC sent me looking for the project. From the webpage:

The aim of the EU FP 7 Large-Scale Integrating Project LarKC is to develop the Large Knowledge Collider (LarKC, for short, pronounced “lark”), a platform for massive distributed incomplete reasoning that will remove the scalability barriers of currently existing reasoning systems for the Semantic Web.

This will be achieved by:

  • Enriching the current logic-based Semantic Web reasoning methods with methods from information retrieval, machine learning, information theory, databases, and probabilistic reasoning,
  • Employing cognitively inspired approaches and techniques such as spreading activation, focus of attention, reinforcement, habituation, relevance reasoning, and bounded rationality.
  • Building a distributed reasoning platform and realizing it both on a high-performance computing cluster and via “computing at home”.

Listening to the video while writing this post but did I hear correctly that data would have to be transformed into a uniform format or vocabulary? Was listening to: http://videolectures.net/larkcag09_vanharmelen_llkc/, try around time mark 12:00 and following.

I also noticed on the project homepage:


Start: 01-April-08
End: 30-Sep-11
Duration 42 months

So, what happens to LarKC on 1-Oct-11?

LDIF – Linked Data Integration Framework

Filed under: LDIF,Linked Data,LOD,RDF,Semantic Web — Patrick Durusau @ 9:02 am

LDIF – Linked Data Integration Framework 0.1

From the webpage:

The Web of Linked Data grows rapidly and contains data from a wide range of different domains, including life science data, geographic data, government data, library and media data, as well as cross-domain datasets such as DBpedia or Freebase. Linked Data applications that want to consume data from this global data space face the challenges that:

  1. data sources use a wide range of different RDF vocabularies to represent data about the same type of entity.
  2. the same real-world entity, for instance a person or a place, is identified with different URIs within different data sources.

This usage of different vocabularies as well as the usage of URI aliases makes it very cumbersome for an application developer to write SPARQL queries against Web data which originates from multiple sources. In order to ease using Web data in the application context, it is thus advisable to translate data to a single target vocabulary (vocabulary mapping) and to replace URI aliases with a single target URI on the client side (identity resolution), before starting to ask SPARQL queries against the data.

Up-till-now, there have not been any integrated tools that help application developers with these tasks. With LDIF, we try to fill this gap and provide an initial alpha version of an open-source Linked Data Integration Framework that can be used by Linked Data applications to translate Web data and normalize URI aliases.

More comments will follow, but…

Isn’t this the reverse of the well-known synonym table in IR?

Instead of substituting synonyms in the query expression, the underlying data is being transformed to produce…a lack of synonyms?

No, not the reverse of a synonym table, in synonym table terms, we would lose the synonym table and transform the underlying textual data to use only a single term where before there were N terms, all of which occurred in the synonym table.

If I search for a term previously listed in the synonym table, but one replaced by a common term, my search result will be empty.

No more synonyms? That sounds like a bad plan to me.

June 24, 2011

It’s official — the grand central EDW will never happen

Filed under: Data Warehouse,Enterprise Integration,Semantic Web — Patrick Durusau @ 10:46 am

It’s official — the grand central EDW will never happen

Kurt Monash cites presentations at the Enzee Universe conference by IBM, Merv Adrian (Gartner) and Forrester Research panning the idea of a grand central EDW (Enterprise Data Warehouse).

If that isn’t going to happen for any particular enterprise, does that mean no universal data warehouse, a/k/a, the Semantic Web?

Even if Linked Data were to succeed in linking all data together, that’s the easy part. Useful access has always been a question of mapping semantics and that’s the hard part. The part that requires people in the loop. People like librarians.

June 17, 2011

Moma, What do URLs in RDF Mean?

Filed under: RDF,Semantic Web — Patrick Durusau @ 7:23 pm

Lars Marius Garshol says in a tweet:

The old “how to find what URIs represent information resources in RDF” issue returns, now with real consequences

pointing to: How to find what URLs in an RDF graph refer to information resources?.

You may also be interested in Jenni Tennison’s summary of a recent TAG meeting on the subject:

URI Definition Discovery and Metadata Architecture

The afternoon session on Tuesday was spent on Jonathan Rees’s work on the Architecture of the World Wide Semantic Web, which covers, amongst other things, what people in semantic web circles call httpRange-14. At core, this is about the kinds of URIs we can use to refer to real-world things, what the response to HTTP requests on those URIs should be, and how we find out information about these resources.

Jonathan has put together a document called Providing and discovering definitions of URIs which covers the various ways that have been suggested over time, including the 303 method that was recommended by the TAG in 2005 and methods that have been suggested by various people since that time.

It’s clear that the 303 method has lots of practical shortcomings for people deploying linked data, and isn’t the way in which URIs are commonly used by Facebook and schema.org, who don’t currently care about using separate URIs for documents and the things those documents are about. We discussed these alongside concerns that we continue to support people who want to do things like describe the license or provenance of a document (as well as the facts that it contains) and don’t introduce anything that is incompatible with the ways in which people who have been following recommended practice are publishing their linked data. The general mood was that we need to support some kind of ‘punning’, whereby a single URI could be used to refer to both a document and a real-world thing, with different properties being assigned to different ‘views’ of that resource.

Jonathan is going to continue to work on the draft, incorporating some other possible approaches. It’s a very contentious topic within the linked data community. My opinion is while we need to provide some ‘good practice’ guides for linked data publishers, we can’t just stick to a theoretical ideal that experience has shown not to be practical. What I’d hope is that the TAG can help to pull together the various arguments for and against different options, and document whatever approach the wider community supports.

My suggested “best practice” is to not trust linked data, RDF, or topic maps data unless it is tested (passes) and you trust its point of origin.

Anymore than you would print your credit card number and pin on the side of your car. Blind trust in any data source is a bad idea.

June 13, 2011

Why Schema.org Will Win

Filed under: Ontology,OWL,RDF,Schema,Semantic Web — Patrick Durusau @ 7:04 pm

It isn’t hard to see why schema.org is going to win out over “other” semantic web efforts.

The first paragraph at the schema.org website says why:

This site provides a collection of schemas, i.e., html tags, that webmasters can use to markup their pages in ways recognized by major search providers. Search engines including Bing, Google and Yahoo! rely on this markup to improve the display of search results, making it easier for people to find the right web pages.

  • Easy: Uses HTML tags
  • Immediate Utility: Recognized by Bing, Google and Yahoo!
  • Immediate Payoff: People can find the right web pages (your web pages)

Ironic that when HTML came up the scene, any number of hypertext engines offered more complex and useful approaches to hypertext.

But the advantages of HTML were:

  • Easy: Used simple tags
  • Immediate Utility: Useful to the author
  • Immediate Payoff: Joins hypertext network for others to find (your web pages)

I think the third advantage in each case is the crucial one. We are vain enough that making our information more findable is a real incentive, if there is a reasonable expectation of it being found. Today or tomorrow. Not ten years from now.

June 7, 2011

Machine Learning and Knowledge Discovery for Semantic Web

Filed under: Machine Learning,Semantic Web — Patrick Durusau @ 6:20 pm

Machine Learning and Knowledge Discovery for Semantic Web

Description:

Machine Learning and Semantic web are covering conceptually different sides of the same story – Semantic Web’s typical approach is top-down modeling of knowledge and proceeding down towards the data while Machine Learning is almost entirely data-driven bottom-up approach trying to discover the structure in the data and express it in the more abstract ways and rich knowledge formalisms. The talk will discuss possible interaction and usage of Machine Learning and Knowledge discovery for Semantic Web with emphases on ontology construction. In the second half of the talk we will take a look at some research using machine learning for Semantic Web and demos of the corresponding prototype systems.

Slides.

The presentation runs 80+ minutes but three quick points:

First, the “Semi-Automatic Data-Driven Ontology Construction, http://ontogen.ijs.si, from a slightly different point of view, could be converted into a topic map authoring tool for working with data.

Second, the “jaguar” search example at 39:29 was particularly compelling. Definitely improves the usefulness of the search results but still working at the document level. The document level is the wrong level for search, unless you just like wasting time repeating what other people have already done.

Third, there are lots of other tools and resources at: http://ailab.ijs.si/. I am going to be slowly mining this site but if you encounter something really interesting, please make a comment or drop me a note.

Definitely a group to watch.

June 6, 2011

Detection, Representation, and Exploitation of Events in the Semantic Web (DeRiVE 2011)

Filed under: Challenges,Conferences,Dataset,Semantic Web — Patrick Durusau @ 1:57 pm

Detection, Representation, and Exploitation of Events in the Semantic Web (DeRiVE 2011)

Full Day Workshop in conjunction with the 10th International Semantic Web Conference 2011 23/24 October 2011, Bonn, Germany

Important Dates

Deadline for paper submission: 8 August 2011 23:59 (11:59pm) Hawaii time
Notification of Acceptance: 29 August 2011 23:59 (11:59pm) Hawaii time
Camera-ready version: 8 September 2011
Workshop: 23 or 24 October 2011

Abstract:

The goal of DeRiVE 2011 is to strengthen the participation of the semantic web community in the recent surge of research on the use of events as a key concept for representing knowledge and organizing and structuring media on the web. The workshop invites contributions to three central questions, and the goal is to formulate answers to these questions that advance and reflect the current state of understanding of events in the semantic web. Each submission will be expected to address at least one question explicitly, and, if possible, include a system demonstration. We have released an event challenge dataset for use in the preparation of contributions, with the goal of supporting a shared understanding of their impact. A prize will be awarded for the best use(s) of the dataset; but the use of other datasets will also be allowed.

See the CFP for questions papers must address.

Also note the anticipated release of a dataset:

We will release a dataset of event data. In addition to regular papers, we invite everybody to submit a Data Challenge paper describing work on this dataset. We welcome analyses, extensions, alignments or modifications of the dataset, as well as applications and demos. The best Data Challenge paper will get a prize.

The dataset consists of over 100.000 events from three sources: the music website Last.fm, and the entertainment websites upcoming.yahoo.com and eventful.com. All three are represented in the LODE schema. Next to events, they contain artists, venues and location and time information. Some links between the instances of the three datasets are provided.

Suggestions for modeling events in topic maps?

June 1, 2011

Silk – A Link Discovery Framework for the Web of Data

Filed under: Linked Data,LOD,RDF,Semantic Web — Patrick Durusau @ 6:52 pm

Silk – A Link Discovery Framework for the Web of Data

From the website:

The Web of Data is built upon two simple ideas: First, to employ the RDF data model to publish structured data on the Web. Second, to set explicit RDF links between data items within different data sources. Background information about the Web of Data is found at the wiki pages of the W3C Linking Open Data community effort, in the overview article Linked Data – The Story So Far and in the tutorial on How to publish Linked Data on the Web.

The Silk Link Discovery Framework supports data publishers in accomplishing the second task. Using the declarative Silk – Link Specification Language (Silk-LSL), developers can specify which types of RDF links should be discovered between data sources as well as which conditions data items must fulfill in order to be interlinked. These link conditions may combine various similarity metrics and can take the graph around a data item into account, which is addressed using an RDF path language. Silk accesses the data sources that should be interlinked via the SPARQL protocol and can thus be used against local as well as remote SPARQL endpoints.

Of particular interest are the comparison operators:

A comparison operator evaluates two inputs and computes their similarity based on a user-defined metric.
The Silk framework currently supports the following similarity metrics, which return a similarity value between 0 (lowest similarity) and 1 (highest similarity) each:

Metric Description
levenshtein([float maxDistance], [float minValue], [float maxValue]) String similarity based on the Levenshtein metric.
jaro String similarity based on the Jaro distance metric.
jaroWinkler String similarity based on the Jaro-Winkler metric.
qGrams(int q) String similarity based on q-grams (by default q=2).
equality Return 1 if strings are equal, 0 otherwise.
inequality Return 0 if strings are equal, 1 otherwise.
num(float maxDistance, float minValue, float maxValue) Computes the numeric distance between two numbers and normalizes it using the threshold.
Parameters:
maxDistance The similarity score is 0.0 if the distance is bigger than maxDistance.
minValue, maxValue The minimum and maximum values which occur in the datasource
date(int maxDays) Computes the similarity between two dates (“YYYY-MM-DD” format). At a difference of “maxDays”, the metric evaluates to 0 and progresses towards 1 with a lower difference.
wgs84(string unit, float threshold, string curveStyle) Computes the geographical distance between two points.
Parameters:
unit The unit in which the distance is measured. Allowed values: “meter” or “m” (default) , “kilometer” or “km”

threshold Will result in a 0 for all bigger values than t, values below are varying with the curveStyle
curveStyle “linear” gives a linear transition, “logistic” uses the logistic function f(x)=1/(1+e^(x)) gives a more soft curve with a slow slope at the start and the end of the curve but a steep one in the middle.
Author: Konrad Höffner (MOLE subgroup of Research Group AKSW, University of Leipzig)

(better formatting is available at the original page but I thought the operators important enough to report in full here)

Definitely a step towards more than opaque mapping between links. Note for example that Silk – Link Specification Language declares why two or more links are mapped together. More could be said but this is a start in the right direction.

May 30, 2011

Semantic Web Dog Food (There’s a fly in my
bowl.)

Filed under: Conferences,OWL,RDF,RDFa,Semantic Web — Patrick Durusau @ 6:59 pm

Semantic Web Dog Food

From the website:

Welcome to the Semantic Web Conference Corpus – a.k.a. the Semantic Web Dog Food Corpus! Here you can browse and search information on papers that were presented, people who attended, and other things that have to do with the main conferences and workshops in the area of Semantic Web research.

We currently have information about

  • 2133 papers,
  • 5020 people and
  • 1273 organisations at
  • 20 conferences and
  • 132 workshops,

and a total of 126886 unique triples in our database!

The numbers looked low to me until I read in the FAQ:

This is not just a site for ISWC [International Semantic Web Conference] and ESWC [European Semantic Web Conference] though. We hope that, in time, other metadata sets relating to Semantic Web activity will be hosted here — additional bibliographic data, test sets, community ontologies and so on.

This illustrates a persistent problem of the Semantic Web. This site has one way to encode the semantics of these papers, people, conferences and workshops. Other sources of semantic data on these papers, people, conferences and workshops may well use other ways to encode those semantics. And every group has what it feels are compelling reasons for following its choices and not the choices of others. Assuming they are even aware of the choices of others. (Discovery being another problem but I won’t talk about that now.)

The previous semantic diversity of natural language is now represented by a semantic diversity of ontologies and URIs. Now our computers can more rapidly and reliably detect that we are using different vocabularies. The SW seems like a lot of work for such a result. Particularly since we continue to use diverse vocabularies and more diverse vocabularies continue to arise.

The SW solution, using OWL Full:

5.2.1 owl:sameAs

The built-in OWL property owl:sameAs links an individual to an individual. Such an owl:sameAs statement indicates that two URI references actually refer to the same thing: the individuals have the same “identity”.

For individuals such as “people” this notion is relatively easy to understand. For example, we could state that the following two URI references actually refer to the same person:

<rdf:Description rdf:about="#William_Jefferson_Clinton">
<owl:sameAs rdf:resource="#BillClinton"/>
</rdf:Description>

The owl:sameAs statements are often used in defining mappings between ontologies. It is unrealistic to assume everybody will use the same name to refer to individuals. That would require some grand design, which is contrary to the spirit of the web.

In OWL Full, where a class can be treated as instances of (meta)classes, we can use the owl:sameAs construct to define class equality, thus indicating that two concepts have the same intensional meaning. An example:

<owl:Class rdf:ID="FootballTeam">
<owl:sameAs rdf:resource="http://sports.org/US#SoccerTeam"/>
</owl:Class>

One could imagine this axiom to be part of a European sports ontology. The two classes are treated here as individuals, in this case as instances of the class owl:Class. This allows us to state that the class FootballTeam in some European sports ontology denotes the same concept as the class SoccerTeam in some American sports ontology. Note the difference with the statement:

<footballTeam owl:equivalentClass us:soccerTeam />

which states that the two classes have the same class extension, but are not (necessarily) the same concepts.

Anyone see a problem? Other than requiring the use of OWL Full?

The absence of any basis for “…denotes the same concept as….?” I can’t safely reuse this axiom because I don’t know on what basis its author made such a claim. The URIs may provide further information that may satisfy me the axiom is correct but that still leaves me in the dark as to why the author of the axiom thought it to be correct. Overly precise for football/soccer ontologies you say but what of drug interaction ontologies? Or ontologies that govern highly sensitive intelligence data?

So we repeat semantic diversity, create maps to overcome the repeated semantic diversity and the maps we create have no explicit basis for the mappings they represent. Tell me again why this was a good idea?

Social Data on the Web (SDoW2011)

Filed under: Conferences,Data,Semantic Web — Patrick Durusau @ 6:55 pm

Social Data on the Web (SDoW2011)

Important Dates:

Submission deadline: Aug 15, 2011 (23:59 pm Hawaii time, GMT-10)
Notification of acceptance: Sep 05, 2011
Camera-ready paper submission: Sep 15, 2011
Camera-ready proceedings: Oct 07, 2011
Workshop: Oct 23/24, 2011

From the website:

Aim and Scope

The 4th international workshop Social Data on the Web (SDoW2011) co-located with the 10th International Semantic Web Conference (ISWC2011) aims to bring together researchers, developers and practitioners involved in semantically-enhancing social media websites, as well as academics researching more formal aspect of these interactions between the Semantic Web and Social Web.

It is now widely agreed in the community that the Semantic Web and the Social Web can benefit from each other. One the one hand, the speed at which data is being created on the Social Web is growing at exponential rate. Recent statistics showed that about 100 million Tweets are created per day and that Facebook has now 500 million users. Yet, some issues still have to be tackled, such as how to efficiently make sense of all this data, how to ensure trust and privacy on the Social Web, how to interlink data from different systems, whether it is on the Web or in the enterprise, or more recently, how to link Social Network and sensor networks to enable Semantic Citizen Sensing.

Prior Proceedings:

SDoW2008

SDoW2009

SDoW2010

May 20, 2011

Seevl

Filed under: Dataset,Interface Research/Design,Linked Data,Music Retrieval,Semantic Web — Patrick Durusau @ 4:04 pm

Seevl: Reinventing Music Discovery

If you are interested in music or interfaces, this is a must stop location!

Simple search box.

I tried searching for artists, albums, types of music.

In addition to search results you also get suggestions of related information.

The Why is this related? link for related information was particularly interesting. It offers a “why” additional information was offered for a particular search result.

Developers can access their data for non-commercial uses for free.

The simplicity of the interface was a real plus.

May 18, 2011

Datalift

Filed under: Dataset,Linked Data,Semantic Web — Patrick Durusau @ 6:42 pm

Datalift (also available in French)

From the webpage:

Datalift brings raw structured data coming from various formats (relational databases, CSV, XML, …) to semantic data interlinked on the Web of Data.

Datalift is an experimental research project funded by the French national research agency. Its goal is to develop a platform to publish and interlink datasets on the Web of data. Datalift will both publish datasets coming from a network of partners and data providers and propose a set of tools for easing the datasets publication process.

A few steps to data heaven

The project will provide tools allowing to facilitate each step of the publication process:

  • selecting ontologies for publishing data
  • converting data to the appropriate format (RDF using the selected ontology)
  • publishing the linked data
  • interlinking data with other data sources

The project is funded for three years so it needs to hit the ground on the run.

I am sure they would appreciate useful feedback.

May 3, 2011

PoolParty

Filed under: Linked Data,Semantic Web,SKOS,Thesaurus — Patrick Durusau @ 1:07 pm

PoolParty

From the website:

PoolParty is a thesaurus management system and a SKOS editor for the Semantic Web including text mining and linked data capabilities. The system helps to build and maintain multilingual thesauri providing an easy-to-use interface. PoolParty server provides semantic services to integrate semantic search or recommender systems into enterprise systems like CMS, web shops, CRM or Wikis.

I encountered PoolParty in the video Pool Party – Semantic Search.

The video elides over a lot of difficulties but what effective advertising doesn’t?

Curious if anyone is familiar with this group/product?


Update: 31 May 2011

Slides: Pool Party – Semantic Search

Nice slide deck on semantic search issues.

April 20, 2011

RDFa API and RDFa 1.1 Primer Drafts Updated

Filed under: RDF,RDFa,Semantic Web — Patrick Durusau @ 2:10 pm

The RDF Web Applications Working Group has published new Working Drafts.

RDFa API

RDFa 1.1 Primer

April 19, 2011

March 31, 2011

….object coreference on the semantic web (and a question)

Filed under: Conferences,Semantic Web — Patrick Durusau @ 3:41 pm

A self-training approach for resolving object coreference on the semantic web by Wei Hu, Jianfeng Chen, and Yuzhong Qu, all of Nanjing University, Nanjing, China.

Abstract:

An object on the Semantic Web is likely to be denoted with multiple URIs by different parties. Object coreference resolution is to identify “equivalent” URIs that denote the same object. Driven by the Linking Open Data (LOD) initiative, millions of URIs have been explicitly linked with owl:sameAs statements, but potentially coreferent ones are still considerable. Existing approaches address the problem mainly from two directions: one is based upon equivalence inference mandated by OWL semantics, which finds semantically coreferent URIs but probably omits many potential ones; the other is via similarity computation between property-value pairs, which is not always accurate enough. In this paper, we propose a self-training approach for object coreference resolution on the Semantic Web, which leverages the two classes of approaches to bridge the gap between semantically coreferent URIs and potential candidates. For an object URI, we firstly establish a kernel that consists of semantically coreferent URIs based on owl:sameAs, (inverse) functional properties and (max-)cardinalities, and then extend such kernel iteratively in terms of discriminative property-value pairs in the descriptions of URIs. In particular, the discriminability is learnt with a statistical measurement, which not only exploits key characteristics for representing an object, but also takes into account the matchability between properties from pragmatics. In addition, frequent property combinations are mined to improve the accuracy of the resolution. We implement a scalable system and demonstrate that our approach achieves good precision and recall for resolving object coreference, on both benchmark and large-scale datasets.

Interesting work.

In particular the use of property-value pairs in the service of discovering similarity.

So, why are users limited to owl:sameAs?

If machines can discover property-value pairs that identify “objects,” then why not enable users to declare property-value pairs that identify the same “objects?”

Such declarations could be used by both machines and users.

March 30, 2011

State of the LOD Cloud

Filed under: LOD,RDF,Semantic Web — Patrick Durusau @ 12:36 pm

State of the LOD Cloud

A more complete resource than the one I referenced in The Linking Open Data cloud diagram.

I haven’t seen any movement towards solving any of the fundamental identity issues with the LOD cloud.

On the other hand, topic mappers can make use of these URIs as names and specify other data published with those URIs to form an actual identification.

One that is reliably interchangeable with others.

I think the emphasis on URIs being dereferencable.

No one says what happens after a URI is dereferenced but that’s to avoid admitting that a URI is insufficient as an identifier.

March 23, 2011

RDF and Semantic Web

Filed under: RDF,Semantic Web,Topic Maps — Patrick Durusau @ 6:03 am

RDF and Semantic Web: can we reach escape velocity?

Jenni Tennison’s slides from TPAC 2010 are an interesting insight into how an “insider” views the current state of RDF and the Semantic Web.

I disagree with her on a couple of crucial points:

RDF’s only revolution, but the key one, is using URIs to name things, including properties and classes

identifying things with URIs does two really useful things

  • disambiguates, enabling joins with other data using same URI
    • mash-ups beyond mapping things on a Google Map
  • provides something at the end of the URI
    • extra information, explanation, context
    • in a basic entity-attribute-value model that enables combination without either up-front agreement or end-user jiggerypokery

First, the “identifying things with URIs” is re-use of a very old idea, the perfect language, which has a universal and unbroken record of failure. (see my Blast from the Past and citations therein.)

Second, how is combination possible without either up-front agreement or end-user jiggerypokery?

Combining information without either up-front agreement or end-user jiggerypokery is why we get such odd search results now.

Let’s take a simple example. Search for “democracy” and see what results you get.

Now, do you really think that “democracy” (limiting my remarks to the US at the moment) from documents in the 18th and 19th centuries means the same thing as “democracy” after the fall of slavery but prior to women getting the right to vote? Or does it means the same thing as it does today? Or does it mean the same thing as its use in Egypt, where classes other than the moneyed ones may be favored?

No doubt you will say that someone could create URIs for all those senses of democracy, which is true, but the question is will we use them consistently? The answer to that has been no up to this point.

People are inconsistent, semantically speaking and there is no showing that is going to change.

Which brings me to the second major area of my disagreement.

RDF and the Semantic Web are failing (present tense) because they are the answer to a problem RDF and Semantic Web followers are interested in solving.

But not the answer to problems that interests anyone else.

At least not enough to pay the price of RDF and the Semantic Web.

To be fair, topic maps faces the same issue.

But at least topic maps started off with a particular problem (combining indexes) and then expanded to be a general solution.

The Semantic Web started off as a general solution in search of problems that would justify the cost of adoption. Not the best strategy.

I do like Jenni’s emphasis on assisting governments to make their data usefully available. That is a good thing and one that we agree on.

Both topic maps and RDF/SW need to analyze the problems of governments (and others) in making such data available.

Then, understanding the issues they face, derive as low cost a solution as possible within their paradigms to solve that problem.

That could involve URIs, for example, assuming there was a URI + N properties serve to identify a subject protocol.

Not that such a protocol makes us any more semantically consistent, but having more than one property to be inconsistent about, may (emphasis on may) reduce the range of semantic inconsistency.

Take my democracy example. If I had http://NotRealURI/democracy and a property of range, 1800-1850 and to match my sense of democracy required matching both the URI and the date range, that would be a step towards reducing semantic inconsistency.

It is the lack of a requirement that more than one property be matched for identity that underlies the technical failure of RDF/Semantic Web.

Its social failure is in not answering questions that are of interest to developers and ultimately users.

Providing useful answers to problems, seen by users as problems, is the way forward for both topic maps and RDF/Semantic Web.

March 14, 2011

Sixth International Conference on Knowledge Capture – K-Cap 2011

Sixth International Conference on Knowledge Capture – K-Cap 2011

From the website:

In today’s knowledge-driven world, effective access to and use of information is a key enabler for progress. Modern technologies not only are themselves knowledge-intensive technologies, but also produce enormous amounts of new information that we must process and aggregate. These technologies require knowledge capture, which involve the extraction of useful knowledge from vast and diverse sources of information as well as its acquisition directly from users. Driven by the demands for knowledge-based applications and the unprecedented availability of information on the Web, the study of knowledge capture has a renewed importance.

Researchers that work in the area of knowledge capture traditionally belong to several distinct research communities, including knowledge engineering, machine learning, natural language processing, human-computer interaction, artificial intelligence, social networks and the Semantic Web. K-CAP 2011 will provide a forum that brings together members of disparate research communities that are interested in efficiently capturing knowledge from a variety of sources and in creating representations that can be useful for reasoning, analysis, and other forms of machine processing. We solicit high-quality research papers for publication and presentation at our conference. Our aim is to promote multidisciplinary research that could lead to a new generation of tools and methodologies for knowledge capture.

Conference:

25 – 29 June 2011
Banff Conference Centre
Banff, Alberta, Canada

Call for papers has closed. Will try to post a note about the conference earlier next year.

Proceedings from previous conferences available through the ACM Digital Library – Knowledge Capture.

Let me know if you have trouble with the ACM link. I sometimes don’t get removal of all the tracing cruft off of URLs correct. There really should be a “clean” URL option for sites like the ACM.

Personal Semantic Data – PSD 2011

Filed under: Conferences,RDF,Semantic Web,Semantics — Patrick Durusau @ 6:51 am

Personal Semantic Data – PSD 2011

From the website:

Personal Semantic Data is scattered over several media, and while semantic technologies are already successfully deployed on the Web as well as on the desktop, data integration is not always straightforward. The transition from the desktop to a distributed system for Personal Information Management (PIM) raises new challenges which need to be addressed. These challenges overlap areas related to human-computer interaction, user modeling, privacy and security, information extraction, retrieval and matching.

With the growth of the Web, a lot of personal information is kept online, on websites like Google, Amazon, Flickr, YouTube, Facebook. We also store pieces of personal information on our computers, on our phones and other devices. All the data is important, that’s why we keep it, but managing such a fragmented system becomes a chore on its own instead of providing support and information for doing the tasks we have to do. Adding to the challenge are proprietary formats and locked silos (online or offline in applications).

The Semantic Web enables the creation of structured and interlinked data through the use of common vocabularies to describe it, and a common representation – RDF. Through projects like Linking Open Data (LOD), SIOC and FOAF, large amounts of data is available now on the Web in structured form, including personal information about people and their social relationships. Applying semantic technologies to the desktop resulted in the Semantic Desktop, which provides a framework for linking data on the desktop.

The challenge lies in extending the benefits of the semantic technologies across the borders of the different environments, and providing a uniform view of one’s personal information regardless of where it resides, which vocabularies were used to describe it and how it is represented. Sharing personal semantic data is also challenging, with privacy and security being two of the most important and difficult issues to tackle.

Important Dates:

15 April 2011 – Submission deadline
30 April 2011 – Author notification
10 May 2011 – Camera-ready version
26 June 2011 – Workshop day

I think the secret of semantic integration is the more information that becomes available, the more heterogeneous the systems and information become and the greater the need for topic maps.

Mostly because replacing that many systems in a coordinated way, over the vast diversity of interests and users, simply isn’t possible.

Would be nice to have a showing of interest by topic maps at this workshop.

March 10, 2011

Pentaho BI Suite Enterprise Edition (TM/SW Are You Listening?)

Filed under: BI,Linked Data,Marketing,Semantic Web — Patrick Durusau @ 8:12 am

Pentaho BI Suite Enterprise Edition

From the website:

Pentaho is the open source business intelligence leader. Thousands of organizations globally depend on Pentaho to make faster and better business decisions that positively impact their bottom lines. Download the Pentaho BI Suite today if you want to speed your BI development, deploy on-premise or in the cloud or cut BI licensing costs by up to 90%.

There are several open source offerings like this, Talend is another one that comes to mind.

I haven’t looked at its data integration in detail but suspect I know the answer to the question:

Say I have an integration of some BI assets using Pentaho and other BI assets integrated using Talend, how do I integrate those together while maintaining the separately integrated BI assets?

Or for that matter, how do I integrate BI that has been gathered and integrated by others, say Lexis/Nexis?

Interesting too to note that this is the sort of user slickness and ease that topic maps and (cough) linked data (see, I knew I could say it), faces in the marketplace.

Does it offer all the bells and whistles of more sophisticated subject identity or reasoning approaches?

No, but if it offers all that users are interested in using, what is your complaint?

Both topic maps and semantic web/linked data approaches need to listen more closely to what users want.

As opposed to deciding what users need.

And delivering the latter instead of the former.

February 23, 2011

AI Mashup Challenge 2011

Filed under: Mashups,RDF,Semantic Web — Patrick Durusau @ 3:01 pm

AI Mashup Challenge 2011

Due date: 1 April 2011

From the website:

The AI mashup challenge accepts and awards mashups that use AI technology, including but not restricted to machine learning and data mining, machine vision, natural language processing, reasoning, ontologies and the semantic web.
Imagine for example:

  • Information extraction or automatic text summarization to create a task-oriented overview mashup for mobile devices.
  • Semantic Web technology and data sources adapting to user and task-specific configurations.
  • Semantic background knowledge (such as ontologies, WordNet or Cyc) to improve search and content combination.
  • Machine translation for mashups that cross language borders.
  • Machine vision technology for novel ways of aggregating images, for instance mixing real and virtual environments.
  • Intelligent agents taking over simple household planning tasks.
  • Text-to-speech technology creating a voice mashup with intelligent and emotional intonation.
  • The display of Pub Med articles on a map based on geographic entity detection referring to diseases or health centers.

The emphasis is not on providing and consuming semantic markup, but rather on using intelligence to mashup these resources in a more powerful way.

This looks like an opportunity for an application that assists users in explicit identification or confirmation of identification of subjects.

Rather than auto-correcting, human-correcting.

Assuming we can capture the corrections, wouldn’t that mean that our apps would incrementally get “smarter?” Rather than starting off from ground zero with each request? (True, a lot of analysis goes on with logs, etc. Why not just ask?)

« Newer PostsOlder Posts »

Powered by WordPress